Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Started by Andy Fanalmost 5 years ago39 messages
#1Andy Fan
zhihui.fan1213@gmail.com

Hi:

This patch is the first patch in UniqueKey patch series[1]/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com, since I need to
revise
that series many times but the first one would be not that often, so I'd
like to
submit this one for review first so that I don't need to maintain it again
and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not
null
in current query. The not null is judged by checking pg_attribute and
query's
restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
RelOptInfo level so far.

Any thoughts?

[1]: /messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#2Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#1)
1 attachment(s)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Wed, Feb 10, 2021 at 11:18 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

This patch is the first patch in UniqueKey patch series[1], since I need
to revise
that series many times but the first one would be not that often, so I'd
like to
submit this one for review first so that I don't need to maintain it again
and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not
null
in current query. The not null is judged by checking pg_attribute and
query's
restrictinfo. The info is only maintained at Base RelOptInfo and
Partition's
RelOptInfo level so far.

Any thoughts?

[1]
/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Add the missed patch..

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchapplication/octet-stream; name=v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchDownload
From 30fd21df7361edeaea2f6078ad5139a53b14340d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 10 Feb 2021 10:58:06 +0800
Subject: [PATCH v1] Introduce notnullattrs field in RelOptInfo to indicate
 which attr

are not null in current query. The not null is judged by checking
pg_attribute and query's restrictinfo. The info is only maintained at
Base RelOptInfo and Partition's RelOptInfo level so far.
---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 10 +++++++++
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..709d2f82ca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -998,6 +998,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1054,6 +1055,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 02f813cebd..d27167dc76 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -829,6 +829,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 177e6e336a..0c5a79f296 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -474,6 +475,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..2d81f6216d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -710,6 +710,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

#3Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Andy Fan (#2)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future? In fact, it might be easy to do
that in this patch itself.

On Wed, Feb 10, 2021 at 8:57 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, Feb 10, 2021 at 11:18 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

This patch is the first patch in UniqueKey patch series[1], since I need to revise
that series many times but the first one would be not that often, so I'd like to
submit this one for review first so that I don't need to maintain it again and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not null
in current query. The not null is judged by checking pg_attribute and query's
restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
RelOptInfo level so far.

Any thoughts?

[1] /messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
--
Best Regards
Andy Fan (https://www.aliyun.com/)

Add the missed patch..

--
Best Regards
Andy Fan (https://www.aliyun.com/)

--
Best Wishes,
Ashutosh Bapat

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#3)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

Why would that be better than keeping it in RelOptInfo?

regards, tom lane

#5David Rowley
dgrowleyml@gmail.com
In reply to: Andy Fan (#1)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Wed, 10 Feb 2021 at 16:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are not null
in current query. The not null is judged by checking pg_attribute and query's
restrictinfo. The info is only maintained at Base RelOptInfo and Partition's
RelOptInfo level so far.

Any thoughts?

I'm not that happy with what exactly the definition is of
RelOptInfo.notnullattrs.

The comment for the field says:
+ /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */

So you could expect someone to assume that these are a Bitmapset of
attnums for all columns in the relation marked as NOT NULL. However,
that's not true since you use find_nonnullable_vars() to chase down
quals that filter out NULL values and you mark those too.

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

I think you either need to explain in detail what the field means or
separate out the two meanings somehow.

David

#6Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#5)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Thank you all, friends!

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 10 Feb 2021 at 16:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are

not null

in current query. The not null is judged by checking pg_attribute and

query's

restrictinfo. The info is only maintained at Base RelOptInfo and

Partition's

RelOptInfo level so far.

Any thoughts?

I'm not that happy with what exactly the definition is of
RelOptInfo.notnullattrs.

The comment for the field says:
+ /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */

So you could expect someone to assume that these are a Bitmapset of
attnums for all columns in the relation marked as NOT NULL. However,
that's not true since you use find_nonnullable_vars() to chase down
quals that filter out NULL values and you mark those too.

The comment might be unclear, but the behavior is on purpose. I want
to find more cases which can make the attr NOT NULL, all of them are
useful for UniqueKey stuff.

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals are
evaluated? I think we have lots of data structures which are set just
after some
stage. but notnullattrs is special because it is set at more than 1
stage. However
I'm doubtful it is unacceptable, Some fields ever change their meaning at
different
stages like Var->varno. If a user has a misunderstanding on it, it
probably will find it
at the testing stage.

I think you either need to explain in detail what the field means or
separate out the two meanings somehow.

Agreed. Besides the not null comes from 2 places (metadata and quals), it
also
means it is based on the relation, rather than the RelTarget. for sample:
A is not
null, but SELECT return_null_udf(A) FROM t, return_null_udf is NULL.
I think
this is not well documented as well. How about just change the documents
as:

1. /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber, the
NOT NULL
* comes from pg_attribtes and quals at different planning stages.
*/

or

2. /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber, the
NOT NULL
* comes from pg_attribtes and quals at different planning stages. And
it just means
* the base attr rather than RelOptInfo->reltarget.
*/

I don't like to separate them into 2 fields because it may make the usage
harder a
bit as well.
--
Best Regards
Andy Fan (https://www.aliyun.com/)

#7Andy Fan
zhihui.fan1213@gmail.com
In reply to: Ashutosh Bapat (#3)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Thu, Feb 11, 2021 at 9:09 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

I think you want to expand this field in a more generic way. For example:
SELECT udf(a) FROM t WHERE a is not null; In current implementation, I
only
knows a is not null, nothing about if udf(a) is null or not. And we can't
present anything
for joinrel as well since it is just attno.

At the same time, looks we can't tell if UDF(A) is null even if the UDF
is strict and
A is not null?

In fact, it might be easy to do that in this patch itself.

Actually I can't think out the method:)

On Wed, Feb 10, 2021 at 8:57 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, Feb 10, 2021 at 11:18 AM Andy Fan <zhihui.fan1213@gmail.com>

wrote:

Hi:

This patch is the first patch in UniqueKey patch series[1], since I

need to revise

that series many times but the first one would be not that often, so

I'd like to

submit this one for review first so that I don't need to maintain it

again and again.

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch

Introduce notnullattrs field in RelOptInfo to indicate which attr are

not null

in current query. The not null is judged by checking pg_attribute and

query's

restrictinfo. The info is only maintained at Base RelOptInfo and

Partition's

RelOptInfo level so far.

Any thoughts?

[1]

/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Add the missed patch..

--
Best Regards
Andy Fan (https://www.aliyun.com/)

--
Best Wishes,
Ashutosh Bapat

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#8Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Tom Lane (#4)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Thu, Feb 11, 2021 at 8:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

Why would that be better than keeping it in RelOptInfo?

regards, tom lane

We have all the expressions relevant to a given relation (simple,
join, group whatever) in Pathtarget. We could remember notnullness of
attributes of a simple relation in RelOptInfo. But IMO non/nullness of
the TLEs of a relation is more useful that attributes and thus
associate those in the PathTarget which is used to produce TLEs. That
way we could use this infra in more general ways.

--
Best Wishes,
Ashutosh Bapat

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#8)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

On Thu, Feb 11, 2021 at 8:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

Can this information be part of PathTarget structure and hence part of
RelOptInfo::reltarget, so that it can be extended to join, group and
other kinds of RelOptInfo in future?

Why would that be better than keeping it in RelOptInfo?

We have all the expressions relevant to a given relation (simple,
join, group whatever) in Pathtarget. We could remember notnullness of
attributes of a simple relation in RelOptInfo. But IMO non/nullness of
the TLEs of a relation is more useful that attributes and thus
associate those in the PathTarget which is used to produce TLEs. That
way we could use this infra in more general ways.

That argument seems nearly vacuous to me, because for pretty much any
expression that isn't a base-relation Var, the answer will have to be
"don't know". Meanwhile, there are clear costs to keeping such info
in PathTarget, namely having to copy it around. Another issue with
keeping it in PathTarget is that I'm not convinced it'll be readily
available where you need it: most places that would be interested in
making such proofs are only looking at expression trees.

Now there is one angle that *might* become easier if the info were in
PathTarget, namely that it could be simpler and more reliable to mark
nullable output columns of an outer join as being nullable (even if
they came from not-null base columns). However, as I've muttered
about elsewhere, I'd prefer to deal with that can of worms by altering
the representation of the Vars themselves. Again, if you're looking
at a WHERE clause, it's not real clear how you would find a relevant
PathTarget.

regards, tom lane

#10David Rowley
dgrowleyml@gmail.com
In reply to: Andy Fan (#6)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com> wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals are
evaluated? I think we have lots of data structures which are set just after some
stage. but notnullattrs is special because it is set at more than 1 stage. However
I'm doubtful it is unacceptable, Some fields ever change their meaning at different
stages like Var->varno. If a user has a misunderstanding on it, it probably will find it
at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1]/messages/by-id/14678.1401639369@sss.pgh.pa.us but it
was quickly shot down. Tom's reasoning there in [1]/messages/by-id/14678.1401639369@sss.pgh.pa.us seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

David

[1]: /messages/by-id/14678.1401639369@sss.pgh.pa.us

#11Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#10)
2 attachment(s)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

I think I get what you mean. You are saying notnullattrs is only correct
at the *current* stage, namely set_rel_size. It would not be true after
that, but the data is still there. That would cause some confusion. I
admit
that is something I didn't realize before. I checked other fields of
RelOptInfo,
looks no one filed works like this, so I am not really happy with this
design
now. I'm OK with saying more things along these lines. That can be done
we all understand each other well. Any better design is welcome as well.
I think the "Var represents null stuff" is good, until I see your comments
below.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

The current patch can detect t.nullable is not null correctly. That
is done by find_nonnullable_vars(qual) and deconstruct_recure stage.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

Thanks for sharing this.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Agreed. The simplest way is just adding some comments. If go a
step further, how about just reset the notnullattrs when it is nullable
later like outer join? I have added this logic in the attached patch.
(comment for the notnullattrs is still not touched). I think we only
need to handle this in build_join_rel stage. With the v2 commit 2,
notnullattrs might be unset too early, but if the value is there, then
it is correct.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

v2-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchapplication/octet-stream; name=v2-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchDownload
From 30fd21df7361edeaea2f6078ad5139a53b14340d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 10 Feb 2021 10:58:06 +0800
Subject: [PATCH v2 1/2] Introduce notnullattrs field in RelOptInfo to indicate
 which attr

are not null in current query. The not null is judged by checking
pg_attribute and query's restrictinfo. The info is only maintained at
Base RelOptInfo and Partition's RelOptInfo level so far.
---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 10 +++++++++
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..709d2f82ca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -998,6 +998,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1054,6 +1055,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 02f813cebd..d27167dc76 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -829,6 +829,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 177e6e336a..0c5a79f296 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -474,6 +475,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..2d81f6216d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -710,6 +710,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

v2-0002-Reset-notnullattrs-for-outer-join.patchapplication/octet-stream; name=v2-0002-Reset-notnullattrs-for-outer-join.patchDownload
From 1c83cbe7589031440bcf38dff8e81199a507566a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Tue, 16 Feb 2021 21:56:08 +0800
Subject: [PATCH v2 2/2] Reset notnullattrs for outer join

---
 src/backend/optimizer/util/relnode.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..c06d60a051 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -555,6 +555,19 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
 	}
 }
 
+/*
+ * unset_notnullattrs
+ */
+static void
+unset_notnullattrs(PlannerInfo *root, Relids relids)
+{
+	int i = -1;
+	while ((i = bms_next_member(relids, i)) >= 0)
+	{
+		root->simple_rel_array[i]->notnullattrs = NULL;
+	}
+}
+
 /*
  * build_join_rel
  *	  Returns relation entry corresponding to the union of two given rels,
@@ -671,6 +684,16 @@ build_join_rel(PlannerInfo *root,
 	joinrel->partexprs = NULL;
 	joinrel->nullable_partexprs = NULL;
 
+	if (sjinfo->jointype == JOIN_LEFT)
+	{
+		unset_notnullattrs(root, inner_rel->relids);
+	}
+	else if (sjinfo->jointype == JOIN_FULL)
+	{
+		unset_notnullattrs(root, inner_rel->relids);
+		unset_notnullattrs(root, outer_rel->relids);
+	}
+
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
 
-- 
2.21.0

#12Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#11)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 10:03 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com>
wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

I think I get what you mean. You are saying notnullattrs is only correct
at the *current* stage, namely set_rel_size. It would not be true after
that, but the data is still there. That would cause some confusion. I
admit
that is something I didn't realize before. I checked other fields of
RelOptInfo,
looks no one filed works like this, so I am not really happy with this
design
now. I'm OK with saying more things along these lines. That can be done
we all understand each other well. Any better design is welcome as well.
I think the "Var represents null stuff" is good, until I see your
comments below.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

The current patch can detect t.nullable is not null correctly. That
is done by find_nonnullable_vars(qual) and deconstruct_recure stage.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

Thanks for sharing this.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Agreed. The simplest way is just adding some comments. If go a
step further, how about just reset the notnullattrs when it is nullable
later like outer join? I have added this logic in the attached patch.
(comment for the notnullattrs is still not touched). I think we only
need to handle this in build_join_rel stage.

..

With the v2 commit 2,
notnullattrs might be unset too early, but if the value is there, then
it is correct.

This looks bad as well. How about adding an extra field in RelOptInfo for
the
outer join case. For example:

@@ -710,8 +710,14 @@ typedef struct RelOptInfo
        PlannerInfo *subroot;           /* if subquery */
        List       *subplan_params; /* if subquery */
        int                     rel_parallel_workers;   /* wanted number of
parallel workers */
-       /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+       /*
+        * Not null attrs, start from -FirstLowInvalidHeapAttributeNumber.
The nullness
+        * might be changed after outer join, So we need to consult with
leftouter_relids
+        * before using it.
+        */
        Bitmapset               *notnullattrs;
+       /* A list of Relids which will be a outer rel when join with this
relation. */
+       List    *leftouter_relids;

/* Information about foreign tables and foreign joins */
Oid serverid; /* identifies
server for the table or join */

leftout_relids should be able to be filled with root->join_info_list. If
we go with this
direction, not sure leftouter_relids should be a List or not since I even
can't think
out a query which can have more than one relids for a relation.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#13Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#10)
4 attachment(s)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test case
for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Besides the above fix in v3, I changed the comments alongs the notnullattrs
as below and added a true positive helper function is_var_nullable.

+       /*
+        * Not null attrs, the values are calculated by looking into
pg_attribute and quals
+        * However both cases are not reliable in some outer join cases. So
when
+        * we want to check if a Var is nullable, function is_var_nullable
is a good
+        * place to start with, which is true positive.
+        */
+       Bitmapset               *notnullattrs;

I also found separating the two meanings is unnecessary since both of them
are not reliable in the outer join case and we are just wanting which attr
is
nullable, no match how we know it. The below example shows why
not-null-by-qual
is not readable as well (at least with current implementation)

create table n1(a int, b int not null);
create table n2(a int, b int not null);
create table n3(a int, b int not null);

select * from n1 left join n2 on n1.a = n2.a full join n3 on n2.a = n3.a;

In this case, when we check (n1 left join n2 on n1.a = n2.a) , we know
n1.a is not nullable. However after full join with n3, it changed.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

David

[1]
/messages/by-id/14678.1401639369@sss.pgh.pa.us

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

v3-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchapplication/octet-stream; name=v3-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchDownload
From 0e0ab487745bd5f4896f7c3c111eb9ee8f2197d8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 10 Feb 2021 10:58:06 +0800
Subject: [PATCH v3 1/2] Introduce notnullattrs field in RelOptInfo to indicate
 which attr

are not null in current query. The not null is judged by checking
pg_attribute and query's restrictinfo. The info is only maintained at
Base RelOptInfo level.
---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 23 ++++++++++++++++++-
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/backend/optimizer/util/var.c       | 14 ++++++++++++
 src/include/nodes/pathnodes.h          |  7 ++++++
 src/include/optimizer/optimizer.h      |  2 ++
 6 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..709d2f82ca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -998,6 +998,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1054,6 +1055,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 02f813cebd..a929c25d41 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -706,6 +706,25 @@ deconstruct_jointree(PlannerInfo *root)
 	return result;
 }
 
+
+/*
+ * extract_notnull_attrs
+ */
+static void
+extract_notnull_attrs(PlannerInfo *root, Node *qual)
+{
+	/* Set the not null info now */
+	ListCell	*lc;
+	List		*non_nullable_vars = find_nonnullable_vars(qual);
+	foreach(lc, non_nullable_vars)
+	{
+		Var *var = lfirst_node(Var, lc);
+		RelOptInfo *rel = root->simple_rel_array[var->varno];
+		rel->notnullattrs = bms_add_member(rel->notnullattrs,
+										   var->varattno - FirstLowInvalidHeapAttributeNumber);
+	}
+}
+
 /*
  * deconstruct_recurse
  *	  One recursion level of deconstruct_jointree processing.
@@ -828,13 +847,13 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		foreach(l, (List *) f->quals)
 		{
 			Node	   *qual = (Node *) lfirst(l);
-
 			distribute_qual_to_rels(root, qual,
 									below_outer_join, JOIN_INNER,
 									root->qual_security_level,
 									*qualscope, NULL, NULL,
 									postponed_qual_list);
 		}
+		extract_notnull_attrs(root, f->quals);
 	}
 	else if (IsA(jtnode, JoinExpr))
 	{
@@ -948,6 +967,8 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		root->nullable_baserels = bms_add_members(root->nullable_baserels,
 												  nullable_rels);
 
+		extract_notnull_attrs(root, j->quals);
+
 		/*
 		 * Try to process any quals postponed by children.  If they need
 		 * further postponement, add them to my output postponed_qual_list.
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 177e6e336a..0c5a79f296 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -474,6 +475,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index e307d6fbb0..f2551713cd 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -867,3 +867,17 @@ alias_relid_set(Query *query, Relids relids)
 	}
 	return result;
 }
+
+
+/*
+ * is_var_nullable
+ */
+bool
+is_var_nullable(PlannerInfo *root, Var *var)
+{
+	RelOptInfo *rel = root->simple_rel_array[var->varno];
+
+	return !bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+						  rel->notnullattrs) ||
+		bms_is_member(var->varno, root->nullable_baserels);
+}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..e392eb1a7e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -710,6 +710,13 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/*
+	 * Not null attrs, the value are calculated by looking it pg_attribute and quals
+	 * However both cases are not reliable in some outer join case. So when
+	 * we want to check if a Var is nullable, function is_var_nullable is a good
+	 * place to start with, which is true positive.
+	 */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index d587952b7d..a94b74947b 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -23,6 +23,7 @@
 #define OPTIMIZER_H
 
 #include "nodes/parsenodes.h"
+#include "nodes/bitmapset.h"
 
 /* Test if an expression node represents a SRF call.  Beware multiple eval! */
 #define IS_SRF_CALL(node) \
@@ -198,5 +199,6 @@ extern bool contain_vars_of_level(Node *node, int levelsup);
 extern int	locate_var_of_level(Node *node, int levelsup);
 extern List *pull_var_clause(Node *node, int flags);
 extern Node *flatten_join_alias_vars(Query *query, Node *node);
+extern bool is_var_nullable(PlannerInfo *root, Var *var);
 
 #endif							/* OPTIMIZER_H */
-- 
2.21.0

v3-0002-Some-hack-for-local-testing-only.patchapplication/octet-stream; name=v3-0002-Some-hack-for-local-testing-only.patchDownload
From 8252d9085e6b1087cfb7c86f079a8c8b9cdcf44d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 18 Feb 2021 20:36:39 +0800
Subject: [PATCH v3 2/2] Some hack for local testing only.

---
 src/backend/optimizer/path/allpaths.c | 40 +++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 709d2f82ca..f01d1bc33f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -176,6 +176,25 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 		root->all_baserels = bms_add_member(root->all_baserels, brel->relid);
 	}
 
+	{
+		ListCell	*lc;
+		foreach(lc, root->processed_tlist)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (IsA(tle->expr, Var))
+			{
+				Var *var = (Var *) tle->expr;
+				// (void)var;
+				elog(INFO, "rt=%d attno=%d nullable: %d",
+					 var->varno,
+					 var->varattno,
+					 is_var_nullable(root, var)
+					);
+			}
+		}
+	}
+
+
 	/* Mark base rels as to whether we care about fast-start plans */
 	set_base_rel_consider_startup(root);
 
@@ -1083,6 +1102,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 
 			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
 													child_attno - FirstLowInvalidHeapAttributeNumber);
+
+			{
+				ListCell	*lc;
+				foreach(lc, childrel->reltarget->exprs)
+				{
+					if (IsA(lfirst(lc), Var))
+					{
+						Var *var = (Var *)lfirst(lc);
+
+						char *colname = strVal(
+							lfirst(list_nth_cell(root->simple_rte_array[var->varno]->eref->colnames, var->varattno - 1)));
+						elog(INFO, "parentRT: %d childRT: %d colname: %s varattno: %d, nullable: %d",
+							 rel->relid,
+							 childrel->relid,
+							 colname,
+							 var->varattno,
+							 is_var_nullable(root, var)
+							);
+					}
+				}
+			}
 		}
 
 		/*
-- 
2.21.0

notnull.sqlapplication/octet-stream; name=notnull.sqlDownload
notnull.outapplication/octet-stream; name=notnull.outDownload
#14Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andy Fan (#13)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Thu, Feb 18, 2021 at 08:58:13PM +0800, Andy Fan wrote:

Thanks for continuing work on this patch!

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test case
for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Besides the above fix in v3, I changed the comments alongs the notnullattrs
as below and added a true positive helper function is_var_nullable.

With "true positive" you mean it will always correctly say if a Var is
nullable or not? I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example when
the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Bitmapset *notnullattrs;

It looks like RelOptInfo has its own out function _outRelOptInfo,
probably the notnullattrs should be also present there as BITMAPSET_FIELD?

As a side note, I've attached those two new threads to CF item [1]https://commitfest.postgresql.org/32/2433/,
hopefully it's correct.

[1]: https://commitfest.postgresql.org/32/2433/

#15Andy Fan
zhihui.fan1213@gmail.com
In reply to: Dmitry Dolgov (#14)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, Mar 5, 2021 at 12:00 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Thu, Feb 18, 2021 at 08:58:13PM +0800, Andy Fan wrote:

Thanks for continuing work on this patch!

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com>

wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com>

wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base

quals

are

evaluated? I think we have lots of data structures which are set

just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their

meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test

case

for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Actually they are different sections for UniqueKey. Since I don't want to
mess
two topics in one place, I open another thread. The topic here is how to
represent
a not null attribute, which is a precondition for all UniqueKey stuff. The
thread
" UniqueKey on Partitioned table[1]/messages/by-id/CAKU4AWrU35c9g3cE15JmVwh6B2Hzf4hf7cZUkRsiktv7AKR3Ag@mail.gmail.com " is talking about how to maintain the
UniqueKey on a partitioned table only.

Besides the above fix in v3, I changed the comments alongs the

notnullattrs

as below and added a true positive helper function is_var_nullable.

With "true positive" you mean it will always correctly say if a Var is
nullable or not?

not null. If I say it is not null (return value is false), it is not null
for sure. If
it is nullable (true), it may be still not null for some stages. But I
don't want
to distinguish them too much, so I just say it is nullable.

I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example when
the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a

WHERE
t2.b = 3; "? In this case, the outer join will be reduced to inner join
at
reduce_outer_join stage, which means t2 will not be shown in
nullable_baserels.

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Exactly, that is the blocker issue right now. I hope more authorities can
give
some suggestions to move on.

Bitmapset *notnullattrs;

It looks like RelOptInfo has its own out function _outRelOptInfo,
probably the notnullattrs should be also present there as BITMAPSET_FIELD?

Yes, it should be added.

As a side note, I've attached those two new threads to CF item [1],
hopefully it's correct.

[1]: https://commitfest.postgresql.org/32/2433/

Thanks for doing that. It is correct.

[1]: /messages/by-id/CAKU4AWrU35c9g3cE15JmVwh6B2Hzf4hf7cZUkRsiktv7AKR3Ag@mail.gmail.com
/messages/by-id/CAKU4AWrU35c9g3cE15JmVwh6B2Hzf4hf7cZUkRsiktv7AKR3Ag@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#16Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andy Fan (#15)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, Mar 05, 2021 at 10:22:45AM +0800, Andy Fan wrote:

I checked again and found I do miss the check on JoinExpr->quals. I have
fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is just add
some logs to help local testing. notnull.sql/notnull.out is the test

case

for
this patch, both commit 2 and notnull.* are not intended to be committed
at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Actually they are different sections for UniqueKey. Since I don't want to
mess
two topics in one place, I open another thread. The topic here is how to
represent
a not null attribute, which is a precondition for all UniqueKey stuff. The
thread
" UniqueKey on Partitioned table[1] " is talking about how to maintain the
UniqueKey on a partitioned table only.

Sure, those two threads are addressing different topics. But [1] also
includes the patch for notnullattrs (I guess it's the same as one of the
older versions from this thread), so it would be good to specify which
one should be used to avoid any confusion.

I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example when
the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a

WHERE
t2.b = 3; "? In this case, the outer join will be reduced to inner join
at
reduce_outer_join stage, which means t2 will not be shown in
nullable_baserels.

Nope, as I said it's a bit useless example of full self join t1 on
itself. In this case not null column "a" will be considered as nullable,
but following your description for is_var_nullable it's fine (although
couple of commentaries to this function are clearly necessary).

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Exactly, that is the blocker issue right now. I hope more authorities can
give
some suggestions to move on.

Hm...why essentially a documentation question is the blocker? Or if you
mean it's a question of the patch scope, are there any arguments for
extending it?

#17Andy Fan
zhihui.fan1213@gmail.com
In reply to: Dmitry Dolgov (#16)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Fri, Mar 5, 2021 at 4:16 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, Mar 05, 2021 at 10:22:45AM +0800, Andy Fan wrote:

I checked again and found I do miss the check on JoinExpr->quals. I

have

fixed it in v3 patch. Thanks for the review!

In the attached v3, commit 1 is the real patch, and commit 2 is

just add

some logs to help local testing. notnull.sql/notnull.out is the test

case

for
this patch, both commit 2 and notnull.* are not intended to be

committed

at last.

Just to clarify, this version of notnullattrs here is the latest one,
and another one from "UniqueKey on Partitioned table" thread should be
disregarded?

Actually they are different sections for UniqueKey. Since I don't want

to

mess
two topics in one place, I open another thread. The topic here is how to
represent
a not null attribute, which is a precondition for all UniqueKey stuff.

The

thread
" UniqueKey on Partitioned table[1] " is talking about how to maintain

the

UniqueKey on a partitioned table only.

Sure, those two threads are addressing different topics. But [1] also
includes the patch for notnullattrs (I guess it's the same as one of the
older versions from this thread), so it would be good to specify which
one should be used to avoid any confusion.

I'm not sure about this, but couldn't be there still
some cases when a Var belongs to nullable_baserels, but still has some
constraints preventing it from being nullable (e.g. a silly example

when

the not nullable column belong to the table, and the query does full
join of this table on itself using this column)?

Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a

WHERE
t2.b = 3; "? In this case, the outer join will be reduced to inner join
at
reduce_outer_join stage, which means t2 will not be shown in
nullable_baserels.

Nope, as I said it's a bit useless example of full self join t1 on
itself. In this case not null column "a" will be considered as nullable,
but following your description for is_var_nullable it's fine (although
couple of commentaries to this function are clearly necessary).

Is this function necessary for the following patches? I've got an
impression that the discussion in this thread was mostly evolving about
correct description when notnullattrs could be used, not making it
bullet proof.

Exactly, that is the blocker issue right now. I hope more authorities can
give
some suggestions to move on.

Hm...why essentially a documentation question is the blocker? Or if you
mean it's a question of the patch scope, are there any arguments for
extending it?

I treat the below comment as the blocker issue:

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

David/Tom/Ashutosh, do you mind to share more insights to this?
I mean the target is the patch is in a committable state.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#18Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#10)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml@gmail.com>

wrote:

The reason I don't really like this is that it really depends where
you want to use RelOptInfo.notnullattrs. If someone wants to use it
to optimise something before the base quals are evaluated then they
might be unhappy that they found some NULLs.

Do you mean the notnullattrs is not set correctly before the base quals

are

evaluated? I think we have lots of data structures which are set just

after some

stage. but notnullattrs is special because it is set at more than 1

stage. However

I'm doubtful it is unacceptable, Some fields ever change their meaning

at different

stages like Var->varno. If a user has a misunderstanding on it, it

probably will find it

at the testing stage.

You're maybe focusing too much on your use case for notnullattrs. It
only cares about NULLs in the result for each query level.

.... thinks of an example...

OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
decided that I might like to write a patch which rewrite the query to
use COUNT(*) when it was certain that "id" could not contain NULLs.

The query is:

SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
JOIN sales s ON p.partid = s.partid GROUP BY p.partid;

sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
the patch, I checked the comment for notnullattrs and it says "Not
null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
should be ok to assume since sales.saleid is marked in notnullattrs
that I can rewrite the query?!

The documentation about the RelOptInfo.notnullattrs needs to be clear
what exactly it means. I'm not saying your representation of how to
record NOT NULL in incorrect. I'm saying that you need to be clear
what exactly is being recorded in that field.

If you want it to mean "attribute marked here cannot output NULL
values at this query level", then you should say something along those
lines.

However, having said that, because this is a Bitmapset of
pg_attribute.attnums, it's only possible to record Vars from base
relations. It does not seem like you have any means to record
attributes that are normally NULLable, but cannot produce NULL values
due to a strict join qual.

e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;

I'd expect the RelOptInfo for t not to contain a bit for the
"nullable" column, but there's no way to record the fact that the join
RelOptInfo for {t,j} cannot produce a NULL for that column. It might
be quite useful to know that for the UniqueKeys patch.

I read your comments again and find I miss your point before. So I'd
summarize
the my current understanding to make sure we are in the same place for
further
working.

I want to define a notnullattrs on RelOptInfo struct. The not nullable
may comes from catalog definition or quals on the given query. For example:

CREATE TABLE t(a INT NOT NULL, nullable INT);
SELECT * FROM t; ==> a is not null for sure by definition.
SELECT * FROM t WHERE nullable > 3; ==> nullable is not null as well by
qual.

However the thing becomes complex with the below 2 cases.

1. SELECT * FROM t INNER JOIN j on t.nullable = q.b;
We know t.b will not be null **finally**. But the current plan may something
like this:

QUERY PLAN
------------------------------------------
Merge Join
Merge Cond: (t.nullable = j.something)
-> Sort
Sort Key: t.nullable
-> Seq Scan on t
-> Sort
Sort Key: j.something
-> Seq Scan on j
(8 rows)

which means the Path "Seq Scan on t" still contains some null values. At
least,
we should not assume t.nullable is "not nullable" the base relation stage.

2. SELECT t.a FROM j LEFT JOIN t ON t.b = t.a;
Even the t.a is not null by definition, but it may have null **finally**
due to
the outer join.

My current patch doesn't handle the 2 cases well since t.nullable is marked
as
NOT NULL for both cases.

I know there's another discussion here between Ashutosh and Tom about
PathTarget's and Vars. I had the Var idea too once myself [1] but it
was quickly shot down. Tom's reasoning there in [1] seems legit. I
guess we'd need some sort of planner version of Var and never confuse
it with the Parse version of Var. That sounds like quite a big
project which would have quite a lot of code churn. I'm not sure how
acceptable it would be to have Var represent both these things. It
gets complex when you do equal(var1, var2) and expect that to return
true when everything matches apart from the notnull field. We
currently have this issue with the "location" field and we even have a
special macro which just ignores those in equalfuncs.c. I imagine not
many people would like to expand that to other fields.

It would be good to agree on the correct representation for Vars that
cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Looks we have to maintain not null on the general RelOptInfo level rather
than Base
RelOptInfo. But I don't want to teach Var about the notnull so far. The
reasons are: 1).
We need to maintain the Planner version and Parser version due to the VIEW
case.
2). We have to ignore the extra part for equal(Var, Var) . 3). Var is
usually shared among
different RelOptInfo. which means we have to maintain different copies for
this purpose IIUC.

I assume we want to know if a Var is nullable with a function like.
is_var_notnullable(Var *var, Relids relids). If so, we can define the
data as below:

struct RelOptInfo {

Bitmapset** notnullattrs;
..
};

After this we can implement the function as:

bool
is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Probably we can make some hackers to reduce the notnullattrs's memory usage
overhead.

Any thoughts?

David

[1]
/messages/by-id/14678.1401639369@sss.pgh.pa.us

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#19Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#18)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

I assume we want to know if a Var is nullable with a function like.
is_var_notnullable(Var *var, Relids relids). If so, we can define the
data as below:

struct RelOptInfo {

Bitmapset** notnullattrs;
..
};

After this we can implement the function as:

bool
is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Probably we can make some hackers to reduce the notnullattrs's memory usage
overhead.

To be more precise, to make the rel->notnullattrs shorter, we can do the
following methods:
1). Relids only has single element, we can always use a 1-len array rather
than rel->varno
elements. 2). For multi-elements relids, we use the max(varno) as the
length of rel->notnullattrs.
3). For some cases, the notnullattrs of a baserel is not changed in later
stages, we can just
reuse the same Bitmapset * in later stages.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#20Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#18)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

However the thing becomes complex with the below 2 cases.

1. SELECT * FROM t INNER JOIN j on t.nullable = q.b;
We know t.b will be not null **finally**. But the current plan may
something
like this:

QUERY PLAN
------------------------------------------
Merge Join
Merge Cond: (t.nullable = j.something)
-> Sort
Sort Key: t.nullable
-> Seq Scan on t
-> Sort
Sort Key: j.something
-> Seq Scan on j
(8 rows)

which means the Path "Seq Scan on t" still contains some null values. At
least,
we should not assume t.nullable is "not nullable" the base relation stage.

2. SELECT t.a FROM j LEFT JOIN t ON t.b = t.a;
Even the t.a is not null by definition, but it may have null **finally**
due to
the outer join.

The above 2 cases have been addressed by defining the notnullattrs on
every RelOptInfo, and maintaining them on every join. However, per offline
discussion with David, IIUC, there is a more case to think about.

CREATE TABLE t (a INT, b INT);
SELECT * FROM t WHERE a = 1 and b = 2;

We know b is not null after we evaluate the qual b = 2, but it may still
nullable when we just evaluate a = 1;

I prefer to not handle it by saying the semantics of notnullattrs is correct
after we evaluate all the quals on its RelOptInfo.

It would be good to agree on the correct representation for Vars that

cannot produce NULLs so that we don't shut the door on classes of
optimisation that require something other than what you need for your
case.

Looks we have to maintain not null on the general RelOptInfo level rather
than Base
RelOptInfo. But I don't want to teach Var about the notnull so far. The
reasons are: 1).
We need to maintain the Planner version and Parser version due to the VIEW
case.
2). We have to ignore the extra part for equal(Var, Var) . 3). Var is
usually shared among
different RelOptInfo. which means we have to maintain different copies for
this purpose IIUC.

I assume we want to know if a Var is nullable with a function like.
is_var_notnullable(Var *var, Relids relids). If so, we can define the
data as below:

struct RelOptInfo {

Bitmapset** notnullattrs;
..
};

After this we can implement the function as:

/*
* is_var_notnullable
* Check if the var is nullable for a given RelOptIno after
* all the quals on it have been evaluated.
*
* var is the var to check, relids is the ids of a RelOptInfo
* we will check on.
*/
bool
is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Do you think this is a reasonable solution?

bool

is_var_notnullable(Var* var, Relids relids)
{
RelOptInfo *rel = find_rel_by_relids(reldis);
return bms_is_member(var->varattno, rel->notnullattrs[var->varno]);
}

Probably we can make some hackers to reduce the notnullattrs's memory usage
overhead.

--

Best Regards
Andy Fan (https://www.aliyun.com/)

#21Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#20)
1 attachment(s)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Hi:
I'd start to work on UniqueKey again, it would be great that we can target
it
to PG 15. The attached patch is just for the notnull_attrs. Since we can't
say
a column is nullable or not without saying in which resultset, so I think
attaching
it to RelOptInfo is unavoidable. Here is how my patch works.

@@ -686,6 +686,12 @@ typedef struct RelOptInfo
/* default result targetlist for Paths scanning this relation */
struct PathTarget *reltarget; /* list of Vars/Exprs, cost, width */

+ Bitmapset **notnull_attrs; /* The attno which is not null after evaluating
+  * all the quals on this relation, for baserel,
+  * the len would always 1. and for others the array
+  * index is relid from relids.
+  */
+

For baserel, it records the notnull attrs as a bitmapset and stores it to
RelOptInfo->notnull_attrs[0]. As for the joinrel, suppose the relids is
{1,3,
5}, then the notnull_attrs[1/3/5] will be used to store notnull_attrs
Bitmapset
for relation 1,3,5 separately. I don't handle this stuff for all kinds of
upper
relation and subquery so far since UniqueKey doesn't rely on it and looks
more stuff should be handled there.

The patch also included some debug messages in
set_baserel/joinrel_notnullattrs
and attached the test.sql for easier review. Any feedback is welcome and
hope
this implementation would not block UniqueKey stuff.

Attachments:

v1-0001-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patchapplication/octet-stream; name=v1-0001-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patchDownload
From e1b957e3eeada364089ea25bd86921e39c2fabb7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sat, 3 Jul 2021 21:06:32 +0800
Subject: [PATCH v1] add the not null attrs for RelOptInfo. Here is how it
 works.

For baserel, it records the notnull attrs as a bitmapset and store it to
RelOptInfo->notnull_attrs[0].  As for the joinrel, suppose the relids is {1,3,
5}, then the notnull_attrs[1/3/5] will be used to store notnull_attrs bitmapset
for relation 1,3,5 separately. I don't handle this stuff for all kinds of upper
relation and subquery so far.
---
 notnulltest.sql                       | 24 ++++++++
 src/backend/nodes/bitmapset.c         | 14 +++++
 src/backend/optimizer/path/allpaths.c | 36 ++++++++++++
 src/backend/optimizer/util/plancat.c  |  9 +++
 src/backend/optimizer/util/relnode.c  | 80 +++++++++++++++++++++++++++
 src/include/nodes/bitmapset.h         |  1 +
 src/include/nodes/pathnodes.h         |  6 ++
 7 files changed, 170 insertions(+)
 create mode 100644 notnulltest.sql

diff --git a/notnulltest.sql b/notnulltest.sql
new file mode 100644
index 0000000000..2a36bd0c7f
--- /dev/null
+++ b/notnulltest.sql
@@ -0,0 +1,24 @@
+create table t1(a int, b int not null, c int, d int);
+create table t2(a int, b int not null, c int, d int);
+
+-- single rel
+select * from t1;
+select * from t1 where a > 1;
+select * from t2 where a > 1 or c > 1;
+
+-- partitioned relation.
+create table p (a int, b int, c int not null) partition by range(a);
+create table p_1 partition of p for values from (0) to (10000) partition by list(b);
+create table p_1_1(b int,  c int not null, a int);
+alter table p_1 attach partition p_1_1 for values in (1);
+
+
+select * from p;
+select * from p where a > 1;
+
+
+-- test join:
+select * from t1, t2 where t1.a = t2.c;
+select t1.a, t2.b, t2.c from t1 left join t2 on t1.a = t2.c;
+select * from t1 full join t2 on t1.a = t2.c;
+
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 649478b0d4..fa71f36aaf 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -663,6 +663,20 @@ bms_num_members(const Bitmapset *a)
 	return result;
 }
 
+/*
+ * bms_max_member - the max member in this bitmap.
+ */
+int
+bms_max_member(const Bitmapset *a)
+{
+	int result;
+	if (a == NULL || bms_is_empty(a))
+		elog(ERROR, "Must be an non-empty bitmapset.");
+	result = (a->nwords - 1) * BITS_PER_BITMAPWORD;
+	result += bmw_leftmost_one_pos(a->words[a->nwords - 1]);
+	return result;
+}
+
 /*
  * bms_membership - does a set have zero, one, or multiple members?
  *
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 353454b183..0dcb9e2337 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -354,6 +354,40 @@ set_base_rel_pathlists(PlannerInfo *root)
 	}
 }
 
+/*
+ * set_baserel_notnull_attrs
+ *
+ *	Set baserel's notnullattrs based on baserestrictinfo
+ */
+static void
+set_baserel_notnull_attrs(RelOptInfo *rel)
+{
+	List *clauses = extract_actual_clauses(rel->baserestrictinfo, false);
+	ListCell	*lc;
+	foreach(lc, find_nonnullable_vars((Node *)clauses))
+	{
+		Var *var = (Var *) lfirst(lc);
+		if (var->varno != rel->relid)
+		{
+			/* Lateral Join */
+			continue;
+		}
+		Assert(var->varno == rel->relid);
+		rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+											   var->varattno - FirstLowInvalidHeapAttributeNumber);
+	}
+
+	/* Debug Only, Will be removed at last. */
+	if (false)
+	{
+		elog(INFO, "FirstLowInvalidHeapAttributeNumber = %d, BaseRel(%d), notnull_attrs = %s",
+			 FirstLowInvalidHeapAttributeNumber,
+			 rel->relid,
+			 bmsToString(rel->notnull_attrs[0])
+			);
+	}
+}
+
 /*
  * set_rel_size
  *	  Set size estimates for a base relation
@@ -457,6 +491,8 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	set_baserel_notnull_attrs(rel);
+
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
 	 */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c5194fdbbf..7d3b40090e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int		i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -471,6 +472,14 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	for (i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+												   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e105a4d5f1..5300652e84 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
 
 #include <limits.h>
 
+#include "access/sysattr.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
@@ -258,6 +259,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->all_partrels = NULL;
 	rel->partexprs = NULL;
 	rel->nullable_partexprs = NULL;
+	rel->notnull_attrs = palloc0(sizeof(Bitmapset *) * 1);
 
 	/*
 	 * Pass assorted information down the inheritance hierarchy.
@@ -556,6 +558,81 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
 	}
 }
 
+static void
+copy_notnull_attrs_to_joinrel(RelOptInfo *joinrel, RelOptInfo *rel)
+{
+	int relid;
+	if (bms_get_singleton_member(rel->relids, &relid))
+		joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[0]);
+	else
+	{
+		relid = -1;
+		while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+			joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[relid]);
+	}
+}
+
+/*
+ *
+ */
+static void
+set_joinrel_notnull_attrs(RelOptInfo *joinrel,
+						  RelOptInfo *outer_rel,
+						  RelOptInfo *inner_rel,
+						  List *restrictlist,
+						  SpecialJoinInfo *sjinfo)
+{
+	if (sjinfo->jointype == JOIN_FULL)
+		/* Both sides are nullable. */
+		return;
+	/* If it is not FULL join, the outer side is not changed. */
+	copy_notnull_attrs_to_joinrel(joinrel, outer_rel);
+	switch(sjinfo->jointype)
+	{
+		case JOIN_ANTI:
+		case JOIN_SEMI:
+		case JOIN_INNER:
+			copy_notnull_attrs_to_joinrel(joinrel, inner_rel);
+			{
+				ListCell	*lc;
+				List *clauses = extract_actual_clauses(restrictlist, false);
+				foreach(lc, find_nonnullable_vars((Node *) clauses))
+				{
+					Var *var = lfirst_node(Var, lc);
+					if (!bms_is_member(var->varno, joinrel->relids))
+					{
+						/* lateral join */
+						continue;
+					}
+					joinrel->notnull_attrs[var->varno] = bms_add_member(
+						joinrel->notnull_attrs[var->varno],
+						var->varattno - FirstLowInvalidHeapAttributeNumber);
+				}
+			}
+			break;
+		case JOIN_LEFT:
+			break;
+		default:
+			elog(ERROR, "Unexpected join type %d", sjinfo->jointype);
+	}
+	/* Debug Only, will be removed at last. */
+	if (false)
+	{
+		int relid = -1;
+		int eLevel = INFO;
+		elog(eLevel, "Dump notnull for JoinRel(%s)", bmsToString(joinrel->relids));
+		while((relid = bms_next_member(joinrel->relids, relid)) >= 0)
+		{
+			Bitmapset *notnullattrs = joinrel->notnull_attrs[relid];
+			if (notnullattrs != NULL)
+				elog(eLevel, "FirstLowInvalidHeapAttributeNumber = %d, RELID = (%d), notnull_attrs: %s",
+					 FirstLowInvalidHeapAttributeNumber,
+					 relid,
+					 bmsToString(notnullattrs));
+		}
+	}
+
+}
 /*
  * build_join_rel
  *	  Returns relation entry corresponding to the union of two given rels,
@@ -672,6 +749,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->all_partrels = NULL;
 	joinrel->partexprs = NULL;
 	joinrel->nullable_partexprs = NULL;
+	joinrel->notnull_attrs = palloc0(sizeof(Bitmapset *) * (bms_max_member(joinrel->relids) + 1));
 
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -763,6 +841,8 @@ build_join_rel(PlannerInfo *root,
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	set_joinrel_notnull_attrs(joinrel, outer_rel, inner_rel, restrictlist, sjinfo);
+
 	return joinrel;
 }
 
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 1fd12de698..303873a546 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -94,6 +94,7 @@ extern bool bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b);
 extern int	bms_singleton_member(const Bitmapset *a);
 extern bool bms_get_singleton_member(const Bitmapset *a, int *member);
 extern int	bms_num_members(const Bitmapset *a);
+extern int	bms_max_member(const Bitmapset *a);
 
 /* optimized tests when we don't need to know exact membership count: */
 extern BMS_Membership bms_membership(const Bitmapset *a);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b7b2817a5d..d6758f21e1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -686,6 +686,12 @@ typedef struct RelOptInfo
 	/* default result targetlist for Paths scanning this relation */
 	struct PathTarget *reltarget;	/* list of Vars/Exprs, cost, width */
 
+	Bitmapset	**notnull_attrs; /* The attno which is not null after evalating
+								  * all the quals on this relation, for baserel,
+								  * the len would always 1. and for others the array
+								  * index is relid from relids.
+								  */
+
 	/* materialization information */
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
-- 
2.21.0

#22David Rowley
dgrowleyml@gmail.com
In reply to: Andy Fan (#21)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Sun, 4 Jul 2021 at 02:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:

I'd start to work on UniqueKey again, it would be great that we can target it
to PG 15. The attached patch is just for the notnull_attrs. Since we can't say
a column is nullable or not without saying in which resultset, so I think attaching
it to RelOptInfo is unavoidable. Here is how my patch works.

I'd also like to see this work progress for PG15. My current thoughts
are that Tom as mentioned another way to track nullability inside Var.
It would be a fairly big task to do that.

Tom, I'm wondering if you might get a chance to draw up a design for
what you've got in mind with this? I assume adding a new field in
Var, but I'm drawing a few blanks on how things might work for equal()
when one Var has the field set and another does not.

David

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#22)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

David Rowley <dgrowleyml@gmail.com> writes:

Tom, I'm wondering if you might get a chance to draw up a design for
what you've got in mind with this? I assume adding a new field in
Var, but I'm drawing a few blanks on how things might work for equal()
when one Var has the field set and another does not.

As I said before, it hasn't progressed much past the handwaving stage,
but it does seem like it's time to get it done. I doubt I'll have any
cycles for it during the commitfest, but maybe I can devote a block of
time during August.

regards, tom lane

#24Andy Fan
zhihui.fan1213@gmail.com
In reply to: Tom Lane (#23)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Jul 6, 2021 at 9:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

Tom, I'm wondering if you might get a chance to draw up a design for
what you've got in mind with this? I assume adding a new field in
Var, but I'm drawing a few blanks on how things might work for equal()
when one Var has the field set and another does not.

As I said before, it hasn't progressed much past the handwaving stage,
but it does seem like it's time to get it done. I doubt I'll have any
cycles for it during the commitfest, but maybe I can devote a block of
time during August.

regards, tom lane

Looking forward to watching this change closely, thank you both David and
Tom!
But I still don't understand what the faults my way have , do you mind
telling the
details?

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#25David Rowley
dgrowleyml@gmail.com
In reply to: Andy Fan (#24)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Wed, 7 Jul 2021 at 13:04, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Looking forward to watching this change closely, thank you both David and Tom!
But I still don't understand what the faults my way have , do you mind telling the
details?

The problem is that we don't need 6 different ways to determine if a
Var can be NULL or not. You're proposing to add a method using
Bitmapsets and Tom has some proposing ideas around tracking
nullability in Vars. We don't need both.

It seems to me that having it in Var allows us to have a much finer
gradient about where exactly a Var can be NULL.

For example: SELECT nullablecol FROM tab WHERE nullablecol = <value>;

If the equality operator is strict then the nullablecol can be NULL in
the WHERE clause but not in the SELECT list. Tom's idea should allow
us to determine both of those things but your idea cannot tell them
apart, so, in theory at least, Tom's idea seems better to me.

David

#26Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#25)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

For example: SELECT nullablecol FROM tab WHERE nullablecol = <value>;

If the equality operator is strict then the nullablecol can be NULL in
the WHERE clause but not in the SELECT list. Tom's idea should allow
us to determine both of those things but your idea cannot tell them
apart, so, in theory at least, Tom's idea seems better to me.

David

That's really something I can't do, thanks for the explanation.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#27Andy Fan
zhihui.fan1213@gmail.com
In reply to: David Rowley (#22)
1 attachment(s)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Jul 6, 2021 at 5:34 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, 4 Jul 2021 at 02:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:

I'd start to work on UniqueKey again, it would be great that we can target it
to PG 15. The attached patch is just for the notnull_attrs. Since we can't say
a column is nullable or not without saying in which resultset, So I think attaching
it to RelOptInfo is unavoidable. Here is how my patch works.

I'd also like to see this work progress for PG15.

Thank you David!

I am re-designing/implementing the UniqueKey, but it is better to have
a design review as soon as possible. This writing is for that. To make the
review easier, I also uploaded my in-completed patch (Correct, runable
with testcase).

Main changes are:
1. Use EC instead of expr, to cover more UniqueKey case.
2. Redesign the UniqueKey as below:

@@ -246,6 +246,7 @@ struct PlannerInfo
* subquery outputs */

List *eq_classes; /* list of active EquivalenceClasses */
+ List *unique_exprs; /* List of unique expr */

bool ec_merging_done; /* set true once ECs are canonical */

+typedef struct UniqueKey
+{
+ NodeTag type;
+ Bitmapset *unique_expr_indexes;
+ bool multi_nulls;
+} UniqueKey;
+

PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of
EquivalenceClass. for example:

CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key);
CREATE UNIQUE INDEX ON t1(a, b);

SELECT DISTINCT * FROM T1 WHERE a = c;

Then we would have PlannerInfo.unique_exprs as below
[
[EC(a, c), EC(b)],
[EC(pk)]
]

RelOptInfo(t1) would have 2 UniqueKeys.
UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]

The design will benefit many table joins cases. For example, 10 tables
join. Each table has a primary key (a, b). Then we would have a UniqueKey like
this.

JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
JoinRel{1,2,3,4, 5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b}

This would be memory consuming and building such UniqueKey is CPU consuming as
well. With the new design, we can store it as

PlannerInfo.unique_exprs =
[
[t1.a, t1.b], -- EC is ignored in document.
[t2.a, t2.b],
[t3.a, t3.b],
[t4.a, t4.b],
[t5.a, t5.b],
[t6.a, t6.b],
[t7.a, t7.b],
[t8.a, t8.b],
[t9.a, t9.b],
[t10.a, t10.b],
]

JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword.
JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword.

3. Define a new SingleRow node and use it in joinrel as well.

+typedef struct SingleRow
+{
+ NodeTag type;
+ Index relid;
+} SingleRow;

SELECT * FROM t1, t2 WHERE t2.pk = 1;

PlannerInfo.unique_exprs
[
[t1.a, t1.b],
SingleRow{relid=2}
]

JoinRel{t1} - Bitmapset{0}
JoinRel{t2} - Bitmapset{1}
JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated
exprs.

4. Cut the useless UniqueKey totally on the baserel stage based on
root->distinct_pathkey. If we want to use it anywhere else, I think this
design is OK as well. for example: group by UniqueKey.

5. Implemented the relation_is_distinct_for(root, rel, distinct_pathkey)
effectively. Here I used distinct_pathkey rather than
Query->distinctClause.

Since I implemented the EC in PlannerInfo.unique_exprs point to the
PathKey.pk_eqclass, so we can compare the address directly with '=', rather than
equal(a, b). (since qual would check the address as well, so even I use equal,
the performance is good as well). SingleRow is handled as well for this case.

You can check the more details in the attached patch. Any feedback is welcome.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

v1-0001-design-uniquekey-v2.patchapplication/octet-stream; name=v1-0001-design-uniquekey-v2.patchDownload
From 7799b005726b542cffa6e9a19704cabf235b214e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 7 Jul 2021 16:02:01 +0800
Subject: [PATCH v1] design uniquekey v2

---
 src/backend/nodes/list.c                      |  16 +
 src/backend/optimizer/path/Makefile           |   3 +-
 src/backend/optimizer/path/allpaths.c         |   7 +-
 src/backend/optimizer/path/uniquekey.c        | 276 ++++++++++++++++++
 src/backend/optimizer/plan/planner.c          |   3 +
 src/backend/optimizer/util/plancat.c          |  17 ++
 src/include/nodes/nodes.h                     |   3 +-
 src/include/nodes/pathnodes.h                 |  16 +
 src/include/nodes/pg_list.h                   |   2 +
 src/include/optimizer/paths.h                 |   4 +
 src/include/optimizer/plancat.h               |   2 +
 src/test/regress/expected/join.out            |  11 +-
 src/test/regress/expected/select_distinct.out |  50 ++++
 src/test/regress/sql/select_distinct.sql      |  20 ++
 14 files changed, 420 insertions(+), 10 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 94fb236daf..3053ce38ea 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -702,6 +702,22 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * list_is_subset_ptr - is A a subset of B?
+ */
+bool
+list_is_subset_ptr(const List *a, const List *b)
+{
+	ListCell *lc;
+	foreach(lc, a)
+	{
+		if (!list_member_ptr(b, lfirst(lc)))
+			return false;
+	}
+	return true;
+}
+
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0dcb9e2337..f5d8a443f9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -396,6 +396,9 @@ static void
 set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			 Index rti, RangeTblEntry *rte)
 {
+	/* Set the notnull before the UniqueKey populate */
+	set_baserel_notnull_attrs(rel);
+
 	if (rel->reloptkind == RELOPT_BASEREL &&
 		relation_excluded_by_constraints(root, rel, rte))
 	{
@@ -491,7 +494,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
-	set_baserel_notnull_attrs(rel);
+
 
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
@@ -616,6 +619,8 @@ set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	 */
 	check_index_predicates(root, rel);
 
+	populate_baserel_uniquekeys(root, rel);
+
 	/* Mark rel with estimated output rows, width, etc */
 	set_baserel_size_estimates(root, rel);
 }
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..6111e9fa00
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,276 @@
+/*-------------------------------------------------------------------------
+ *
+ * pathkeys.c
+ *	  Utilities for maintaining uniquekey.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/pathnodes.h"
+#include "optimizer/paths.h"
+
+
+/*
+ * print_uniquekey
+ *	Used for easier reivew, should be removed before commit.
+ */
+static void
+print_uniquekey(PlannerInfo *root, RelOptInfo *rel)
+{
+	if (false)
+	{
+		ListCell	*lc;
+		elog(INFO, "Rel = %s", bmsToString(rel->relids));
+		foreach(lc, rel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			int i = -1;
+			elog(INFO, "UNIQUEKEY{indexes=%s, multinull=%d}",
+				 bmsToString(ukey->unique_expr_indexes),
+				 ukey->multi_nulls
+				);
+
+			while ((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+			{
+				Node *node = (Node *) list_nth(root->unique_exprs, i);
+				if (IsA(node, SingleRow))
+					elog(INFO,
+						 "Expr(%d) SingleRow{relid = %d}",
+						 i, castNode(SingleRow, node)->relid);
+				else
+					elog(INFO,
+						 "EC(%d), %s", i, nodeToString(node)
+						);
+			}
+		}
+	}
+}
+
+static UniqueKey *
+make_uniquekey(Bitmapset *unique_expr_indexes, bool multi_null)
+{
+	UniqueKey *ukey = makeNode(UniqueKey);
+	ukey->unique_expr_indexes = unique_expr_indexes;
+	ukey->multi_nulls = multi_null;
+	return ukey;
+}
+
+
+static PathKey *
+find_matching_pathkey_expr(Expr *expr,  List *pathkeys, Relids relids)
+{
+	ListCell *lc;
+	foreach(lc, pathkeys)
+	{
+		PathKey * pathkey = lfirst_node(PathKey, lc);
+		if (find_ec_member_matching_expr(pathkey->pk_eclass, expr, relids))
+			return pathkey;
+	}
+	return NULL;
+}
+
+static bool
+add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index,
+							  List *mergeable_const_peer, List *expr_opfamilies)
+{
+
+	List	*unique_ecs = NIL;
+	ListCell	*indexpr_item;
+	int	c = 0;
+	RelOptInfo *rel = unique_index->rel;
+	PathKey	*pathkey;
+	bool	multinull = false;
+
+	indexpr_item = list_head(unique_index->indexprs);
+	for (c = 0; c < unique_index->nkeycolumns; c++)
+	{
+		int attr = unique_index->indexkeys[c];
+		Expr *expr;
+		bool	matched_const = false;
+		ListCell	*lc1, *lc2;
+		if (attr > 0)
+		{
+			Var *var;
+			expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr;
+			var = castNode(Var, expr);
+			Assert(IsA(expr, Var));
+			if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+							  rel->notnull_attrs[0]))
+				multinull = true;
+		}
+		else if (attr == 0)
+		{
+			/* Expression index */
+			expr = lfirst(indexpr_item);
+			indexpr_item = lnext(unique_index->indexprs, indexpr_item);
+			/* We can't grantee an FuncExpr will not return NULLs */
+			multinull = true;
+		}
+		else /* attr < 0 */
+		{
+			/* Index on OID is possible, not handle it for now. */
+			return false;
+		}
+
+		/*
+		 * Check index_col = Const case with regarding to opfamily checking
+		 * If so, we can remove the index_col from the final UniqueKey->exprs.
+		 */
+		forboth(lc1, mergeable_const_peer, lc2, expr_opfamilies)
+		{
+			if (list_member_oid((List *) lfirst(lc2), unique_index->opfamily[c]) &&
+				match_index_to_operand((Node *) lfirst(lc1), c, unique_index))
+			{
+				matched_const = true;
+				break;
+			}
+		}
+
+		if (matched_const)
+			continue;
+
+		/* Check if this expr exist in distinct_pathkey. */
+		pathkey = find_matching_pathkey_expr(expr, root->distinct_pathkeys, rel->relids);
+		if (!pathkey)
+			return false;
+		unique_ecs = lappend(unique_ecs, pathkey->pk_eclass);
+	}
+
+	{
+		Bitmapset *unique_exprs_index = bms_make_singleton(list_length(root->unique_exprs));
+		if (unique_ecs == NIL)
+		{
+			SingleRow *singlerow = makeNode(SingleRow);
+			singlerow->relid = rel->relid;
+			rel->uniquekeys = list_make1(make_uniquekey(unique_exprs_index, false /* multi-null */));
+			root->unique_exprs = lappend(root->unique_exprs, singlerow);
+			return true;
+		}
+		else
+		{
+			UniqueKey *ukey = make_uniquekey(unique_exprs_index, multinull);
+			root->unique_exprs = lappend(root->unique_exprs, unique_ecs);
+			rel->uniquekeys = lappend(rel->uniquekeys, ukey);
+			return false;
+		}
+	}
+	return false;
+}
+
+void
+populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel)
+{
+	ListCell	*lc;
+	List	*mergeable_const_peer = NIL, *expr_opfamilies = NIL;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids))
+			mergeable_const_peer = lappend(mergeable_const_peer, get_rightop(rinfo->clause));
+		else if (bms_is_empty(rinfo->right_relids))
+			mergeable_const_peer = lappend(mergeable_const_peer, get_leftop(rinfo->clause));
+		else
+			continue;
+		expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies);
+	}
+
+	foreach(lc, rel->indexlist)
+	{
+		IndexOptInfo *index = (IndexOptInfo *)lfirst(lc);
+		if (!index->unique || !index->immediate ||
+			(index->indpred != NIL && !index->predOK))
+			continue;
+
+		if (add_uniquekey_for_uniqueindex(root, index,
+										  mergeable_const_peer,
+										  expr_opfamilies))
+			return;
+	}
+
+	print_uniquekey(root, rel);
+}
+
+
+static bool
+uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey, List *lecs, Relids relids)
+{
+	int i = -1;
+	while ((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *expr = list_nth(root->unique_exprs, i);
+		if (IsA(expr, SingleRow))
+		{
+			/* Any column regarding SingleRow.relid is OK */
+			if (!bms_is_member(castNode(SingleRow, expr)->relid, relids))
+				return false;
+		}
+		else if (IsA(expr, List))
+		{
+			/* unique_expr is a List of EquivalenceClass * */
+			if (!list_is_subset_ptr((List*)expr, lecs))
+				return false;
+		}
+		else
+		{
+			/* Impossible to go here */
+			Assert(false);
+			return false;
+		}
+	}
+	return true;
+}
+
+
+bool
+relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *distinct_pathkey)
+{
+	ListCell	*lc;
+	List	*lecs = NIL;
+	Relids	relids = NULL;
+	foreach(lc, distinct_pathkey)
+	{
+		PathKey *pathkey = lfirst(lc);
+		lecs = lappend(lecs, pathkey->pk_eclass);
+		/*
+		 * Note that ec_relids doesn't include child member, but
+		 * distinct would not operate on childrel as well.
+		 */
+		relids = bms_union(relids, pathkey->pk_eclass->ec_relids);
+	}
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst(lc);
+		if (ukey->multi_nulls)
+			continue;
+
+		if (uniquekey_contains_in(root, ukey, lecs, relids))
+			return true;
+	}
+
+	return false;
+
+}
+
+
+static UniqueKey*
+__attribute__ ((unused))
+build_composited_uniquekey(UniqueKey *ukey1, UniqueKey *ukey2, bool multi_null)
+{
+	Bitmapset *unique_expr = bms_union(ukey1->unique_expr_indexes, ukey2->unique_expr_indexes);
+	return make_uniquekey(unique_expr, multi_null);
+
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1868c4eff4..b9d4e6395e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4238,6 +4238,9 @@ create_distinct_paths(PlannerInfo *root,
 	Path	   *path;
 	ListCell   *lc;
 
+	if (relation_is_distinct_for(root, input_rel, root->distinct_pathkeys))
+		return input_rel;
+
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 7d3b40090e..4852d88c52 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2079,6 +2079,23 @@ get_function_rows(PlannerInfo *root, Oid funcid, Node *node)
 	return result;
 }
 
+List *
+get_unique_indexes(RelOptInfo *rel)
+{
+	List	*res = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->indexlist)
+	{
+		IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+
+		if (index->unique && index->immediate &&
+			(index->indpred == NIL || index->predOK))
+			res = lappend(res, index);
+	}
+	return res;
+}
+
+
 /*
  * has_unique_index
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index d9e417bcd7..7a76413ba1 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -280,7 +280,8 @@ typedef enum NodeTag
 	T_RollupData,
 	T_GroupingSetData,
 	T_StatisticExtInfo,
-
+	T_UniqueKey,
+	T_SingleRow,
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
 	 */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d6758f21e1..2ee9e0885b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -246,6 +246,7 @@ struct PlannerInfo
 									 * subquery outputs */
 
 	List	   *eq_classes;		/* list of active EquivalenceClasses */
+	List	   *unique_exprs;		/* List of unique expr */
 
 	bool		ec_merging_done;	/* set true once ECs are canonical */
 
@@ -691,6 +692,7 @@ typedef struct RelOptInfo
 								  * the len would always 1. and for others the array
 								  * index is relid from relids.
 								  */
+	List		*uniquekeys; /* A list of UniqueKey. */
 
 	/* materialization information */
 	List	   *pathlist;		/* Path structures */
@@ -1067,6 +1069,20 @@ typedef struct PathKey
 	bool		pk_nulls_first; /* do NULLs come before normal values? */
 } PathKey;
 
+
+typedef struct UnqiueKey
+{
+	NodeTag	type;
+	Bitmapset	*unique_expr_indexes;
+	bool	multi_nulls;
+} UniqueKey;
+
+typedef struct SingleRow
+{
+	NodeTag	type;
+	Index		relid;
+} SingleRow;
+
 /*
  * VolatileFunctionStatus -- allows nodes to cache their
  * contain_volatile_functions properties. VOLATILITY_UNKNOWN means not yet
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 30f98c4595..bbe0209d7e 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -558,6 +558,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset_ptr(const List *a, const List *b);
+
 extern pg_nodiscard List *list_delete(List *list, void *datum);
 extern pg_nodiscard List *list_delete_ptr(List *list, void *datum);
 extern pg_nodiscard List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index f1d111063c..07a51d737a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -255,4 +255,8 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel);
+extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
+									 List *distinct_pathkey);
 #endif							/* PATHS_H */
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 8d1d6c1b42..2dd4f525e6 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -41,6 +41,8 @@ extern bool relation_excluded_by_constraints(PlannerInfo *root,
 
 extern List *build_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
 
+extern List *get_unique_indexes(RelOptInfo *rel);
+extern List *get_exprs_from_index(IndexOptInfo *indinfo, bool contains_includekey);
 extern bool has_unique_index(RelOptInfo *rel, AttrNumber attno);
 
 extern Selectivity restriction_selectivity(PlannerInfo *root,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index fec0325e73..0631e8076d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4604,18 +4604,15 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+             QUERY PLAN             
+------------------------------------
  Merge Right Join
    Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   ->  Index Scan using b_pkey on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(6 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..c5d32cec3a 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,53 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+-- uniquekey test
+CREATE TABLE uktest(a int, b int, c int not null, d int, e int, f int, PRIMARY KEY(a, b));
+CREATE UNIQUE INDEX on uktest(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uktest;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uktest
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM uktest;
+        QUERY PLAN        
+--------------------------
+ HashAggregate
+   Group Key: c
+   ->  Seq Scan on uktest
+(3 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uktest;
+        QUERY PLAN        
+--------------------------
+ HashAggregate
+   Group Key: c, d
+   ->  Seq Scan on uktest
+(3 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uktest WHERE d > 10;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uktest
+   Filter: (d > 10)
+(2 rows)
+
+-- UniqueKey expressions reduce due to c = Const.
+EXPLAIN (COSTS OFF) SELECT DISTINCT b FROM uktest where a = 1;
+               QUERY PLAN               
+----------------------------------------
+ Bitmap Heap Scan on uktest
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on uktest_pkey
+         Index Cond: (a = 1)
+(4 rows)
+
+-- Single row case.
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM uktest where a = 1 and b = 1;
+               QUERY PLAN               
+----------------------------------------
+ Index Scan using uktest_pkey on uktest
+   Index Cond: ((a = 1) AND (b = 1))
+(2 rows)
+
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..706bcf37df 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,23 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+-- uniquekey test
+CREATE TABLE uktest(a int, b int, c int not null, d int, e int, f int, PRIMARY KEY(a, b));
+CREATE UNIQUE INDEX on uktest(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uktest;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM uktest;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uktest;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uktest WHERE d > 10;
+
+-- UniqueKey expressions reduce due to c = Const.
+EXPLAIN (COSTS OFF) SELECT DISTINCT b FROM uktest where a = 1;
+
+-- Single row case.
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM uktest where a = 1 and b = 1;
+
+
-- 
2.21.0

#28Ranier Vilela
ranier.vf@gmail.com
In reply to: Andy Fan (#27)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

You can check the more details in the attached patch. Any feedback is

welcome.

I have tiny comments about your patch:

1. name of file is uniquekey.c?

+ * pathkeys.c
+ *  Utilities for maintaining uniquekey.

2. Variable "PathKey *pathkey" at function: add_uniquekey_for_uniqueindex,
can have scope reduced.

+ indexpr_item = list_head(unique_index->indexprs);
+ for (c = 0; c < unique_index->nkeycolumns; c++)
+ {
+ PathKey *pathkey;

3. Variable int c = 0, has a redundant initialization at function:
add_uniquekey_for_uniqueindex.

4. Has one word with misspelled?

"/* We can't *guarantee* an FuncExpr will not return NULLs */"

4. Variable int i = -1, has a redudant initialization at function:
uniquekey_contains_in

5. __attribute__ ((unused)) at function: build_composited_uniquekey, is
incompatible with msvc.

6. Postgres uses a newline after variables declarations.

regards,

Ranier Vilela

#29Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#27)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

4. Cut the useless UniqueKey totally on the baserel stage based on
root->distinct_pathkey. If we want to use it anywhere else, I think this
design is OK as well. for example: group by UniqueKey.

The intention of this is I want to cut off the useless UniqueKey ASAP. In the
previous patch, I say "if the unique_exprs not exists in root->distinct_paths,
then it is useless". However This looks only works for single rel. As for the
joinrel, we have to maintain the UniqueKey on mergeable join clause for the case
like below.

SELECT DISTINCT t1.pk FROM t1, t2 WHERE t1.a = t2.pk;
or
SELECT DISTINCT t1.pk FROM t1 left join t2 on t1.a = t2.pk;

In this case, t2.pk isn't shown in distinct_pathkey, but it is still useful at
the join stage and not useful after joining.

So how can we maintain the UniqueKey like t2.pk?
1). If t2.pk exists in root->eq_classes, keep it.
2). If t2.pk doesn't exist in RelOptInfo->reltarget after joining, discard it.

Step 1 is not so bad since we have RelOptInfo.eclass_indexes. However step 2
looks pretty boring since we have to check on every RelOptInfo and we may have
lots of RelOptInfo.

Any suggestions on this?

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#30Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#29)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Jul 13, 2021 at 5:55 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

4. Cut the useless UniqueKey totally on the baserel stage based on
root->distinct_pathkey. If we want to use it anywhere else, I think this
design is OK as well. for example: group by UniqueKey.

The intention of this is I want to cut off the useless UniqueKey ASAP. In the
previous patch, I say "if the unique_exprs not exists in root->distinct_paths,
then it is useless". However This looks only works for single rel. As for the
joinrel, we have to maintain the UniqueKey on mergeable join clause for the case
like below.

SELECT DISTINCT t1.pk FROM t1, t2 WHERE t1.a = t2.pk;
or
SELECT DISTINCT t1.pk FROM t1 left join t2 on t1.a = t2.pk;

In this case, t2.pk isn't shown in distinct_pathkey, but it is still useful at
the join stage and not useful after joining.

So how can we maintain the UniqueKey like t2.pk?
1). If t2.pk exists in root->eq_classes, keep it.
2). If t2.pk doesn't exist in RelOptInfo->reltarget after joining, discard it.

Step 1 is not so bad since we have RelOptInfo.eclass_indexes. However step 2
looks pretty boring since we have to check on every RelOptInfo and we may have
lots of RelOptInfo.

Any suggestions on this?

Just a function like truncate_useless_pathkey would be OK. For that we need
to handle uniquekey_useful_for_merging and uniquekey_useful_for_distinct.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#31Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#30)
6 attachment(s)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Hi:

I have finished the parts for baserel, joinrel, subquery, distinctrel. I think
the hardest ones have been verified. Here is the design overview.

1. Use EC instead of expr to cover more UniqueKey cases.
2. Redesign the UniqueKey as below:

@@ -246,6 +246,7 @@ struct PlannerInfo

List *eq_classes; /* list of active EquivalenceClasses */
+ List *unique_exprs; /* List of unique expr */

bool ec_merging_done; /* set true once ECs are canonical */

+typedef struct UniqueKey
+{
+ NodeTag type;
+ Bitmapset *unique_expr_indexes;
+ bool multi_nulls;
+} UniqueKey;
+

PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of
EquivalenceClass. for example:

CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key);
CREATE UNIQUE INDEX ON t1(a, b);

SELECT DISTINCT * FROM T1 WHERE a = c;

Then we would have PlannerInfo.unique_exprs as below
[
[EC(a, c), EC(b)],
[EC(pk)]
]

RelOptInfo(t1) would have 2 UniqueKeys.
UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]

The design will benefit many table joins cases. For instance a 10- tables join,
each table has a primary key (a, b). Then we would have a UniqueKey like
this.

JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
JoinRel{1,2,3,4,5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b}

When more tables are joined, the list would be longer and longer, build the list
consumes both CPU cycles and memory.

With the method as above, we can store it as:

root->unique_exprs = /* All the UniqueKey is stored once */
[
[t1.a, t1.b], -- EC is ignored in document.
[t2.a, t2.b],
[t3.a, t3.b],
[t4.a, t4.b],
[t5.a, t5.b],
[t6.a, t6.b],
[t7.a, t7.b],
[t8.a, t8.b],
[t9.a, t9.b],
[t10.a, t10.b],
]

JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword.
JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword.

The member in the bitmap is the index of root->unique_exprs rather than
root->eq_classes because we need to store the SingleRow case in
root->unqiue_exprs as well.

3. Define a new SingleRow node and use it in joinrel as well.

+typedef struct SingleRow
+{
+ NodeTag type;
+ Index relid;
+} SingleRow;

SELECT * FROM t1, t2 WHERE t2.pk = 1;

root->unique_exprs
[
[t1.a, t1.b],
SingleRow{relid=2}
]

JoinRel{t1} - Bitmapset{0}
JoinRel{t2} - Bitmapset{1}

JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated
exprs.

4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible.

The overall idea is similar with PathKey, I distinguish the UniqueKey for
distinct purpose and useful_for_merging purpose.

SELECT DISTINCT pk FROM t; -- OK, maintain it all the time, just like
root->query_pathkey.

SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is
use before t1 join t2, but not useful after it.

Currently the known issue I didn't pass the "interesting UniqueKey" info to
subquery well [2]/messages/by-id/CAKU4AWo6-=9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN=+hb_w@mail.gmail.com, I'd like to talk more about this when we discuss about
UnqiueKey on subquery part.

5. relation_is_distinct_for

Now I design the function as

+ bool
+ relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
  *distinct_pathkey)

It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey
has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
checking is very effective. As for the distinct/group as no-op case, we have
pathkey all the time. The only drawback of it is some clauses are not-sortable,
in this case, the root->distinct_pathkey and root->group_pathkey is not
set. However it should be rare by practice, so I ignore this part for
now. After all, I can have a relation_is_disticnt_for version for Exprs. I just
not implemented it so far.

6. EC overhead in UnqiueKey & UNION case.

Until now I didn't create any new EC for the UniqueKey case, I just used the
existing ones. However I can't handle the case like

SELECT a, b FROM t1
UNION
SELECT x, y FROM t2;

In this case, there is no EC created with existing code. and I don't want to
create them for the UniqueKey case as well. so my plan is just not to handle
the case for UNION.

Since we need some big effort from the reviewer, I split the patch into many
smaller chunks.

Patch 1 / Patch 2: I just split some code which UniqueKey uses but not very
interrelated. Splitting them out to reduce the core patch size.
Patch 3: still the notnull stuff. This one doesn't play a big role overall,
even if the design is changed at last, we can just modify some small stuff. IMO,
I don't think it is a blocker issue to move on.
Patch 4: Support the UniqueKey for baserel.
Patch 5: Support the UniqueKey for joinrel.
Patch 6: Support the UniqueKey for some upper relation, like distinctrel,
groupby rel.

I'd suggest moving on like this:
1. Have an overall review to see if any outstanding issues the patch has.
2. If not, we can review and commit patch 1 & patch 2 to reduce the patch size.
3. Decide which method to take for not null stuff. IMO Tom's method
would be a bit
complex and the benefit is not very clear to me[1]/messages/by-id/CAApHDvo5b2pYX+dbPy+ysGBSarezRSfXthX32TZNFm0wPdfKGQ@mail.gmail.com. So the choices
include: a). move on UniqueKey stuff until Tom's method is ready. b). Move
on the UniqueKey with my notnull way, and changes to Tom's method when
necessary. I prefer method b).
4. Review & Commit the UniqueKey for BaseRel part.
5. Review & Commit the UniqueKey for JoinRel part.
6. Review & Commit the UniqueKey for SubQuery part *without* the Interesting
UniqueKey well handled.
7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting
UniqueKey well handled.
8. Discuss about the UniqueKey on partitioned rel case and develop/review/commit
it.
9. Apply UniqueKey stuff on more user cases rather than DISTINCT.

What do you think about this?

[1]: /messages/by-id/CAApHDvo5b2pYX+dbPy+ysGBSarezRSfXthX32TZNFm0wPdfKGQ@mail.gmail.com
[2]: /messages/by-id/CAKU4AWo6-=9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN=+hb_w@mail.gmail.com

Thanks

Attachments:

v3-0002-Just-some-utils-functions.patchapplication/octet-stream; name=v3-0002-Just-some-utils-functions.patchDownload
From f2d62ff4ceca2a33e557e5fb8bc4a2f6f3de872a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 25 Jul 2021 16:26:45 +0800
Subject: [PATCH v3 2/6] Just some utils functions.

---
 src/backend/nodes/list.c                | 21 ++++++++++++
 src/backend/optimizer/path/equivclass.c | 45 +++++++++++++++++++++++++
 src/backend/optimizer/util/tlist.c      | 19 +++++++++++
 src/include/nodes/pg_list.h             |  2 ++
 src/include/optimizer/optimizer.h       |  1 +
 src/include/optimizer/paths.h           |  6 ++++
 6 files changed, 94 insertions(+)

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 94fb236daf..8f2b00012e 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -702,6 +702,27 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+
+/*
+ * list_is_subset_ptr - is A a subset of B?
+ */
+bool
+list_is_subset_ptr(const List *a, const List *b)
+{
+	ListCell *lc;
+	Assert(IsPointerList(a));
+	check_list_invariants(a);
+	Assert(IsPointerList(b));
+	check_list_invariants(b);
+	foreach(lc, a)
+	{
+		if (!list_member_ptr(b, lfirst(lc)))
+			return false;
+	}
+	return true;
+}
+
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 2f688807af..b8ada6e8a3 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -770,6 +770,26 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 	return newec;
 }
 
+/*
+ * find_ec_member_matching_expr
+ *		Locate an EquivalenceClass matching the given expr, if any;
+ *		return NULL if no match.
+ */
+EquivalenceClass *
+find_ec_matching_expr(PlannerInfo *root,
+					  Expr *expr,
+					  RelOptInfo *baserel)
+{
+	int i = -1;
+	while ((i = bms_next_member(baserel->eclass_indexes, i)) >= 0)
+	{
+		EquivalenceClass *ec = list_nth(root->eq_classes, i);
+		if (find_ec_member_matching_expr(ec, expr, baserel->relids))
+			return ec;
+	}
+	return NULL;
+}
+
 /*
  * find_ec_member_matching_expr
  *		Locate an EquivalenceClass member matching the given expr, if any;
@@ -960,6 +980,31 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	return NULL;
 }
 
+
+/*
+ * build_equivalanceclass_list_for_exprs
+ *
+ * 	Given a list of expr, find the related ECs for everyone of them.
+ * if any exprs has no EC related, return NIL.
+ */
+List *
+build_equivalanceclass_list_for_exprs(PlannerInfo *root,
+									  List *exprs,
+									  RelOptInfo *rel)
+{
+	ListCell	*lc;
+	List	*ecs = NIL;
+
+	foreach(lc, exprs)
+	{
+		EquivalenceClass *ec = find_ec_matching_expr(root, lfirst(lc), rel);
+		if (!ec)
+			return NIL;
+		ecs = lappend(ecs, ec);
+	}
+	return ecs;
+}
+
 /*
  * ec_useful_for_merging
  *	check if the ec exists in rel's merageable restrictinfo_lists.
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 311579d059..5d8493c8d6 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -359,6 +359,25 @@ get_sortgroupclause_tle(SortGroupClause *sgClause,
 	return get_sortgroupref_tle(sgClause->tleSortGroupRef, targetList);
 }
 
+/*
+ * get_tle_from_expr
+ *
+ *	Find the targetlist entry matching the given expr and return it.
+ */
+TargetEntry *
+get_tle_from_expr(Expr *expr, List *targetlist)
+{
+	ListCell	*lc;
+
+	foreach(lc, targetlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+		if (equal(tle->expr, expr))
+			return tle;
+	}
+	return NULL;
+}
 /*
  * get_sortgroupclause_expr
  *		Find the targetlist entry matching the given SortGroupClause
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 30f98c4595..bbe0209d7e 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -558,6 +558,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset_ptr(const List *a, const List *b);
+
 extern pg_nodiscard List *list_delete(List *list, void *datum);
 extern pg_nodiscard List *list_delete_ptr(List *list, void *datum);
 extern pg_nodiscard List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 41b49b2662..5911014a32 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -171,6 +171,7 @@ extern TargetEntry *get_sortgroupref_tle(Index sortref,
 										 List *targetList);
 extern TargetEntry *get_sortgroupclause_tle(SortGroupClause *sgClause,
 											List *targetList);
+extern TargetEntry *get_tle_from_expr(Expr *expr, List *targetlist);
 extern Node *get_sortgroupclause_expr(SortGroupClause *sgClause,
 									  List *targetList);
 extern List *get_sortgrouplist_exprs(List *sgClauses,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index a29f616423..e813d82483 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,9 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
 												  Index sortref,
 												  Relids rel,
 												  bool create_it);
+extern EquivalenceClass *find_ec_matching_expr(PlannerInfo *root,
+											   Expr *expr,
+											   RelOptInfo *rel);
 extern EquivalenceMember *find_ec_member_matching_expr(EquivalenceClass *ec,
 													   Expr *expr,
 													   Relids relids);
@@ -146,6 +149,9 @@ extern EquivalenceMember *find_computable_ec_member(PlannerInfo *root,
 extern bool ec_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
 								  EquivalenceClass *ec);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern List *build_equivalanceclass_list_for_exprs(PlannerInfo *root,
+												   List *exprs,
+												   RelOptInfo *rel);
 extern bool relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
 										 EquivalenceClass *ec,
 										 bool require_parallel_safe);
-- 
2.21.0

v3-0005-Support-UniqueKey-on-JoinRel.patchapplication/octet-stream; name=v3-0005-Support-UniqueKey-on-JoinRel.patchDownload
From c4ab5a7ff2f47d3159ba6450ae39ef98acb5f727 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 9 Aug 2021 19:43:58 +0800
Subject: [PATCH v3 5/6] Support UniqueKey on JoinRel.

---
 src/backend/optimizer/path/uniquekey.c  | 377 ++++++++++++++++++++++++
 src/backend/optimizer/util/relnode.c    |   2 +
 src/include/optimizer/paths.h           |   5 +
 src/test/regress/expected/uniquekey.out | 327 ++++++++++++++++++++
 src/test/regress/sql/uniquekey.sql      |  79 +++++
 5 files changed, 790 insertions(+)

diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
index c93075656d..badc843253 100644
--- a/src/backend/optimizer/path/uniquekey.c
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -27,6 +27,26 @@ static bool add_uniquekey_for_uniqueindex(PlannerInfo *root,
 										  List *mergeable_const_peer,
 										  List *expr_opfamilies);
 
+static bool is_uniquekey_nulls_removed(PlannerInfo *root,
+									   UniqueKey *ukey,
+									   RelOptInfo *rel);
+static UniqueKey *adjust_uniquekey_multinull_for_joinrel(PlannerInfo *root,
+														 UniqueKey *joinrel_ukey,
+														 RelOptInfo *rel,
+														 bool below_outer_side);
+
+static bool populate_joinrel_uniquekey_for_rel(PlannerInfo *root, RelOptInfo *joinrel,
+											   RelOptInfo *rel, RelOptInfo *other_rel,
+											   List *restrictlist, JoinType jointype);
+static void populate_joinrel_composited_uniquekey(PlannerInfo *root,
+												  RelOptInfo *joinrel,
+												  RelOptInfo *outerrel,
+												  RelOptInfo *innerrel,
+												  List	*restrictlist,
+												  JoinType jointype,
+												  bool outeruk_still_valid,
+												  bool inneruk_still_valid);
+
 /* UniqueKey is subset of .. */
 static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
 								  List *ecs, Relids relids);
@@ -35,6 +55,9 @@ static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
 static bool unique_ecs_useful_for_distinct(PlannerInfo *root, List *ecs);
 static bool unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
 										  List *unique_ecs);
+static bool is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey *ukey,
+										  RelOptInfo *joinrel);
+
 /* Helper functions to create UniqueKey. */
 static UniqueKey *make_uniquekey(Bitmapset *unique_expr_indexes,
 								 bool multi_null,
@@ -90,6 +113,78 @@ populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel)
 	print_uniquekey(root, rel);
 }
 
+/*
+ * populate_joinrel_uniquekeys
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	bool outeruk_still_valid = false, inneruk_still_valid = false;
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		ListCell	*lc;
+		foreach(lc, outerrel->uniquekeys)
+		{
+			/*
+			 * SEMI/ANTI join can be used to remove NULL values as well.
+			 * So we need to adjust multi_nulls for join.
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  adjust_uniquekey_multinull_for_joinrel(root,
+																				 lfirst(lc),
+																				 joinrel,
+																				 false));
+			return;
+		}
+	}
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return;
+
+	switch(jointype)
+	{
+		case JOIN_INNER:
+			outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel,
+																	 innerrel, restrictlist, jointype);
+			inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, innerrel,
+																	 outerrel, restrictlist, jointype);
+			break;
+
+		case JOIN_LEFT:
+			/*
+			 * For left join, we are sure the innerrel's multi_nulls would be true
+			 * and it can't become to multi_nulls=false any more. so just discard it
+			 * and only check the outerrel and composited ones.
+			 */
+			outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel,
+																	 innerrel, restrictlist, jointype);
+			break;
+
+		case JOIN_FULL:
+			/*
+			 * Both sides would contains multi_nulls, don't maintain it
+			 * any more.
+			 */
+			break;
+
+		default:
+			elog(ERROR, "unexpected join_type %d", jointype);
+	}
+
+	populate_joinrel_composited_uniquekey(root, joinrel,
+										  outerrel,
+										  innerrel,
+										  restrictlist,
+										  jointype,
+										  outeruk_still_valid,
+										  inneruk_still_valid);
+
+
+	return;
+}
+
 /*
  * relation_is_distinct_for
  *		Check if the relation is distinct for.
@@ -238,6 +333,251 @@ add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index,
 											 used_for_distinct));
 	return false;
 }
+
+/*
+ * is_uniquekey_nulls_removed
+ *
+ *	note this function will not consider the OUTER JOIN impacts. Caller should
+ * take care of it.
+ *	-- Use my way temporary (RelOptInfo.notnull_attrs) until Tom's is ready.
+ */
+static bool
+is_uniquekey_nulls_removed(PlannerInfo *root,
+						   UniqueKey *ukey,
+						   RelOptInfo *joinrel)
+{
+	int i = -1;
+
+	while((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *node = list_nth(root->unique_exprs, i);
+		List	*ecs;
+		ListCell	*lc;
+		if (IsA(node, SingleRow))
+			continue;
+		ecs = castNode(List, node);
+		foreach(lc, ecs)
+		{
+			EquivalenceClass *ec = lfirst_node(EquivalenceClass, lc);
+			ListCell *emc;
+			foreach(emc, ec->ec_members)
+			{
+				EquivalenceMember *em = lfirst_node(EquivalenceMember, emc);
+				int relid;
+				Var *var;
+				Bitmapset *notnull_attrs;
+				if (!bms_is_subset(em->em_relids, joinrel->relids))
+					continue;
+
+				if (!bms_get_singleton_member(em->em_relids, &relid))
+					continue;
+
+				if (!IsA(em->em_expr, Var))
+					continue;
+
+				var = castNode(Var, em->em_expr);
+
+				if (relid != var->varno)
+					continue;
+
+				notnull_attrs = joinrel->notnull_attrs[var->varno];
+
+				if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+								   notnull_attrs))
+					return false;
+				else
+					break; /* Break to check next ECs */
+			}
+		}
+	}
+	return true;
+}
+
+/*
+ * adjust_uniquekey_multinull_for_joinrel
+ *
+ *	After the join, some NULL values can be removed due to join-clauses.
+ * but the outer join can generated null values again. Return the final
+ * state of the UniqueKey on joinrel.
+ */
+static UniqueKey *
+adjust_uniquekey_multinull_for_joinrel(PlannerInfo *root,
+									   UniqueKey *ukey,
+									   RelOptInfo *joinrel,
+									   bool below_outer_side)
+{
+	if (below_outer_side)
+	{
+		if (ukey->multi_nulls)
+			/* we need it to be multi_nulls, but it is already, just return it. */
+			return ukey;
+		else
+			/* we need it to be multi_nulls, but it is not, create a new one. */
+			return make_uniquekey(ukey->unique_expr_indexes,
+								  true,
+								  ukey->use_for_distinct);
+	}
+	else
+	{
+		/*
+		 * We need to check if the join clauses can remove the NULL values. However
+		 * if it doesn't contain NULL values at the first, we don't need to check it.
+		 */
+		if (!ukey->multi_nulls)
+			return ukey;
+		else
+		{
+			/*
+			 * Multi null values exists. It's time to check if the nulls values
+			 * are removed via outer join.
+			 */
+			if (!is_uniquekey_nulls_removed(root, ukey, joinrel))
+				/* null values can be removed, return the original one. */
+				return ukey;
+			else
+				return make_uniquekey(ukey->unique_expr_indexes,
+									  false, ukey->use_for_distinct);
+		}
+	}
+}
+
+/*
+ * populate_joinrel_uniquekey_for_rel
+ *
+ *    Check if rel.any_column = other_rel.unique_key_columns.
+ * The return value is if the rel->uniquekeys still valid.
+ */
+static bool
+populate_joinrel_uniquekey_for_rel(PlannerInfo *root, RelOptInfo *joinrel,
+								   RelOptInfo *rel, RelOptInfo *other_rel,
+								   List *restrictlist, JoinType type)
+{
+	bool	rel_keep_unique = false;
+	List *other_ecs = NIL;
+	Relids	other_relids = NULL;
+	ListCell	*lc;
+
+	/*
+	 * Gather all the other ECs regarding to rel, if all the unique ecs contains
+	 * in this list, then it hits our expectations.
+	 */
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *r = lfirst_node(RestrictInfo, lc);
+
+		if (r->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL)
+		{
+			other_ecs = lappend(other_ecs, r->right_ec);
+			other_relids = bms_add_members(other_relids, r->right_relids);
+		}
+		else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != NULL)
+		{
+			other_ecs = lappend(other_ecs, r->right_ec);
+			other_relids = bms_add_members(other_relids, r->left_relids);
+		}
+	}
+
+	foreach(lc, other_rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (uniquekey_contains_in(root, ukey, other_ecs, other_relids))
+		{
+			rel_keep_unique = true;
+			break;
+		}
+	}
+
+	if (!rel_keep_unique)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+
+		if (is_uniquekey_useful_afterjoin(root, ukey, joinrel))
+		{
+			ukey = adjust_uniquekey_multinull_for_joinrel(root,
+														  ukey,
+														  joinrel,
+														  false /* outer_side, caller grantees this */);
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ukey);
+		}
+	}
+
+	return true;
+}
+
+
+/*
+ * Populate_joinrel_composited_uniquekey
+ *
+ *	A composited unqiuekey is valid no matter with join type and restrictlist.
+ */
+static void
+populate_joinrel_composited_uniquekey(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List	*restrictlist,
+									  JoinType jointype,
+									  bool left_added,
+									  bool right_added)
+{
+	ListCell	*lc;
+	if (left_added || right_added)
+		/* No need to create the composited ones */
+		return;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		UniqueKey	*outer_ukey = adjust_uniquekey_multinull_for_joinrel(root,
+																		 lfirst(lc),
+																		 joinrel,
+																		 jointype == JOIN_FULL);
+		ListCell	*lc2;
+
+		if (!is_uniquekey_useful_afterjoin(root, outer_ukey, joinrel))
+			continue;
+
+		foreach(lc2, innerrel->uniquekeys)
+		{
+			UniqueKey	*inner_ukey = adjust_uniquekey_multinull_for_joinrel(root,
+																			 lfirst(lc2),
+																			 joinrel,
+																			 (jointype == JOIN_FULL || jointype == JOIN_LEFT)
+				);
+
+			UniqueKey	*comp_ukey;
+
+			if (!is_uniquekey_useful_afterjoin(root, inner_ukey, joinrel))
+				continue;
+
+			comp_ukey = make_uniquekey(
+				/* unique_expr_indexes is easy, just union the both sides. */
+				bms_union(outer_ukey->unique_expr_indexes, inner_ukey->unique_expr_indexes),
+				/*
+				 * If both are !multi_nulls, then the composited one is !multi_null
+				 * no matter with jointype and join clauses. otherwise, it is multi
+				 * nulls no matter with other factors.
+				 *
+				 */
+				outer_ukey->multi_nulls || inner_ukey->multi_nulls,
+				/*
+				 * we need both sides are used in distinct to say the composited
+				 * one is used for distinct as well.
+				 */
+				outer_ukey->use_for_distinct && inner_ukey->use_for_distinct);
+
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, comp_ukey);
+		}
+	}
+}
+
+
 /*
  * uniquekey_contains_in
  *	Return if UniqueKey contains in the list of EquivalenceClass
@@ -333,6 +673,43 @@ unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *unique_e
 
 	return true;
 }
+
+/*
+ * is_uniquekey_useful_afterjoin
+ *
+ *  is useful when it contains in distinct_pathkey or in mergable join clauses.
+ */
+static bool
+is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey *ukey,
+							 RelOptInfo *joinrel)
+{
+	int	i = -1;
+
+	if (ukey->use_for_distinct)
+		return true;
+
+	while((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *exprs =  list_nth(root->unique_exprs, i);
+		if (IsA(exprs, List))
+		{
+			if (!unique_ecs_useful_for_merging(root, joinrel, (List *)exprs))
+				return false;
+		}
+		else
+		{
+			Assert(IsA(exprs, SingleRow));
+			/*
+			 * Ideally we should check if there are a expr on SingleRow
+			 * used in joinrel's joinclauses, but it can't be checked effectively
+			 * for now, so we just check the rest part. so just think
+			 * it is useful.
+			 */
+		}
+	}
+	return true;
+}
+
 /*
  *	make_uniquekey
  */
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index b75e1679e6..6695f9f2a6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -846,6 +846,8 @@ build_join_rel(PlannerInfo *root,
 	}
 
 	set_joinrel_notnull_attrs(joinrel, outer_rel, inner_rel, restrictlist, sjinfo);
+	populate_joinrel_uniquekeys(root, joinrel, outer_rel, inner_rel,
+								restrictlist, sjinfo->jointype);
 
 	return joinrel;
 }
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 68b8b40ca9..f233837e59 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -265,6 +265,11 @@ extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 extern void populate_baserel_uniquekeys(PlannerInfo *root,
 										RelOptInfo *baserel);
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+										RelOptInfo *outerrel, RelOptInfo *innerrel,
+										List *restrictlist, JoinType jointype);
 extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 									 List *distinct_pathkey);
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/uniquekey.out b/src/test/regress/expected/uniquekey.out
index d9a8634e80..c2bd1fa619 100644
--- a/src/test/regress/expected/uniquekey.out
+++ b/src/test/regress/expected/uniquekey.out
@@ -80,3 +80,330 @@ EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
  Seq Scan on uqk1
 (1 row)
 
+------------------------------------------------------
+-- Test UniqueKey on one side still valid after join.
+-----------------------------------------------------
+-- uqk1(c, d) is the uniquekey with mutli nulls at single relation access.
+-- so distinct is not no-op.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Hash Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+-- Both uqk1 (c,d) are a valid uniquekey. 
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- uqk1.c is null at baserel, but the null values are removed after join.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE  uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Hash Join
+   Hash Cond: ((uqk1.a = uqk2.pk) AND (uqk1.c = uqk2.c))
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+-- uqk1.c is null at baserel, but the null values are removed after join
+-- but new null values are generated due to outer join again. so distinct
+-- is still needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 right join uqk2
+on uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Hash Right Join
+               Hash Cond: ((uqk1.a = uqk2.pk) AND (uqk1.c = uqk2.c))
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+------------------------------------------------------
+-- Test join: Composited UniqueKey
+-----------------------------------------------------
+-- both t1.pk and t1.pk is valid uniquekey.
+EXPLAIN SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.27 rows=15 width=8)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=5 width=4)
+   ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+(4 rows)
+
+SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2 order by 1, 2;
+ pk | pk 
+----+----
+  1 |  1
+  1 |  4
+  1 |  5
+  2 |  1
+  2 |  4
+  2 |  5
+  3 |  1
+  3 |  4
+  3 |  5
+  4 |  1
+  4 |  4
+  4 |  5
+  5 |  1
+  5 |  4
+  5 |  5
+(15 rows)
+
+-- NOT OK, since t1.c includes multi nulls. 
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ HashAggregate  (cost=2.20..2.26 rows=6 width=12)
+   Group Key: t1.c, t1.d, t2.pk
+   ->  Nested Loop  (cost=0.00..2.16 rows=6 width=12)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+         ->  Materialize  (cost=0.00..1.06 rows=2 width=8)
+               ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=2 width=8)
+                     Filter: (c IS NULL)
+(7 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+ c | d | pk 
+---+---+----
+   | 4 |  1
+   | 4 |  4
+   | 4 |  5
+(3 rows)
+
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+ c | d | pk 
+---+---+----
+   | 4 |  1
+   | 4 |  1
+   | 4 |  4
+   | 4 |  4
+   | 4 |  5
+   | 4 |  5
+(6 rows)
+
+-- let's remove the t1.c's multi null values
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.20 rows=9 width=12)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=3 width=8)
+         Filter: (c IS NOT NULL)
+   ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+(5 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3 ;
+ c | d | pk 
+---+---+----
+ 1 | 1 |  1
+ 1 | 1 |  4
+ 1 | 1 |  5
+ 2 | 2 |  1
+ 2 | 2 |  4
+ 2 | 2 |  5
+ 3 | 3 |  1
+ 3 | 3 |  4
+ 3 | 3 |  5
+(9 rows)
+
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3;
+ c | d | pk 
+---+---+----
+ 1 | 1 |  1
+ 1 | 1 |  4
+ 1 | 1 |  5
+ 2 | 2 |  1
+ 2 | 2 |  4
+ 2 | 2 |  5
+ 3 | 3 |  1
+ 3 | 3 |  4
+ 3 | 3 |  5
+(9 rows)
+
+-- test onerow case with composited cases.
+-- t2.c is onerow. OK
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.12 rows=3 width=12)
+   ->  Seq Scan on uqk2 t2  (cost=0.00..1.04 rows=1 width=4)
+         Filter: (pk = 1)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=3 width=8)
+         Filter: (c IS NOT NULL)
+(5 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+ c | d | c 
+---+---+---
+ 1 | 1 | 1
+ 2 | 2 | 1
+ 3 | 3 | 1
+(3 rows)
+
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+ c | d | c 
+---+---+---
+ 1 | 1 | 1
+ 2 | 2 | 1
+ 3 | 3 | 1
+(3 rows)
+
+-- t2.c is onerow, but t1.c has multi-nulls, NOt OK.
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Unique  (cost=2.12..2.14 rows=2 width=12)
+   ->  Sort  (cost=2.12..2.12 rows=2 width=12)
+         Sort Key: t1.c, t1.d, t2.c
+         ->  Nested Loop  (cost=0.00..2.11 rows=2 width=12)
+               ->  Seq Scan on uqk2 t2  (cost=0.00..1.04 rows=1 width=4)
+                     Filter: (pk = 1)
+               ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=2 width=8)
+                     Filter: (c IS NULL)
+(8 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+ c | d | c 
+---+---+---
+   | 4 | 1
+(1 row)
+
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+ c | d | c 
+---+---+---
+   | 4 | 1
+   | 4 | 1
+(2 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-----------------------------------
+-- Test Join: Special OneRow case.
+-----------------------------------
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.c = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 1)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk order BY 1;
+ d 
+---
+ 1
+(1 row)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 1)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+ 1
+(1 row)
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-----------------------------------------
+-- Test more non-unique cases after join.
+-----------------------------------------
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1, uqk2 WHERE uqk1.c = uqk2.c;
+                 QUERY PLAN                 
+--------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Join
+               Hash Cond: (uqk1.c = uqk2.c)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.c;
+                  QUERY PLAN                  
+----------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.d
+         ->  Nested Loop
+               Join Filter: (uqk1.c = uqk2.c)
+               ->  Seq Scan on uqk1
+                     Filter: (pk = 1)
+               ->  Seq Scan on uqk2
+(8 rows)
+
diff --git a/src/test/regress/sql/uniquekey.sql b/src/test/regress/sql/uniquekey.sql
index a1b538d1c1..3f93872246 100644
--- a/src/test/regress/sql/uniquekey.sql
+++ b/src/test/regress/sql/uniquekey.sql
@@ -27,3 +27,82 @@ EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE c = 1 and d = 1;
 
 -- Test Distinct ON
 EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
+
+------------------------------------------------------
+-- Test UniqueKey on one side still valid after join.
+-----------------------------------------------------
+-- uqk1(c, d) is the uniquekey with mutli nulls at single relation access.
+-- so distinct is not no-op.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk;
+
+-- Both uqk1 (c,d) are a valid uniquekey. 
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- uqk1.c is null at baserel, but the null values are removed after join.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE  uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+
+-- uqk1.c is null at baserel, but the null values are removed after join
+-- but new null values are generated due to outer join again. so distinct
+-- is still needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 right join uqk2
+on uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+
+
+------------------------------------------------------
+-- Test join: Composited UniqueKey
+-----------------------------------------------------
+-- both t1.pk and t1.pk is valid uniquekey.
+EXPLAIN SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2;
+SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2 order by 1, 2;
+
+-- NOT OK, since t1.c includes multi nulls. 
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null;
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+
+-- let's remove the t1.c's multi null values
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null;
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3 ;
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3;
+
+-- test onerow case with composited cases.
+
+-- t2.c is onerow. OK
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+
+-- t2.c is onerow, but t1.c has multi-nulls, NOt OK.
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-----------------------------------
+-- Test Join: Special OneRow case.
+-----------------------------------
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk;
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk order BY 1;
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d;
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-----------------------------------------
+-- Test more non-unique cases after join.
+-----------------------------------------
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1, uqk2 WHERE uqk1.c = uqk2.c;
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.c;
-- 
2.21.0

v3-0003-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patchapplication/octet-stream; name=v3-0003-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patchDownload
From 867dbc9e866ce149b07ff55631598e0843046b61 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sat, 3 Jul 2021 21:06:32 +0800
Subject: [PATCH v3 3/6] add the not null attrs for RelOptInfo. Here is how it
 works.

For baserel, it records the notnull attrs as a bitmapset and store it to
RelOptInfo->notnull_attrs[0].  As for the joinrel, suppose the relids is {1,3,
5}, then the notnull_attrs[1/3/5] will be used to store notnull_attrs bitmapset
for relation 1,3,5 separately. I don't handle this stuff for all kinds of upper
relation and subquery so far since it doesn't pass the design review yet.
---
 notnulltest.sql                       | 24 ++++++++
 src/backend/nodes/bitmapset.c         | 14 +++++
 src/backend/optimizer/path/allpaths.c | 36 ++++++++++++
 src/backend/optimizer/util/plancat.c  |  9 +++
 src/backend/optimizer/util/relnode.c  | 80 +++++++++++++++++++++++++++
 src/include/nodes/bitmapset.h         |  1 +
 src/include/nodes/pathnodes.h         |  6 ++
 7 files changed, 170 insertions(+)
 create mode 100644 notnulltest.sql

diff --git a/notnulltest.sql b/notnulltest.sql
new file mode 100644
index 0000000000..2a36bd0c7f
--- /dev/null
+++ b/notnulltest.sql
@@ -0,0 +1,24 @@
+create table t1(a int, b int not null, c int, d int);
+create table t2(a int, b int not null, c int, d int);
+
+-- single rel
+select * from t1;
+select * from t1 where a > 1;
+select * from t2 where a > 1 or c > 1;
+
+-- partitioned relation.
+create table p (a int, b int, c int not null) partition by range(a);
+create table p_1 partition of p for values from (0) to (10000) partition by list(b);
+create table p_1_1(b int,  c int not null, a int);
+alter table p_1 attach partition p_1_1 for values in (1);
+
+
+select * from p;
+select * from p where a > 1;
+
+
+-- test join:
+select * from t1, t2 where t1.a = t2.c;
+select t1.a, t2.b, t2.c from t1 left join t2 on t1.a = t2.c;
+select * from t1 full join t2 on t1.a = t2.c;
+
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 649478b0d4..fa71f36aaf 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -663,6 +663,20 @@ bms_num_members(const Bitmapset *a)
 	return result;
 }
 
+/*
+ * bms_max_member - the max member in this bitmap.
+ */
+int
+bms_max_member(const Bitmapset *a)
+{
+	int result;
+	if (a == NULL || bms_is_empty(a))
+		elog(ERROR, "Must be an non-empty bitmapset.");
+	result = (a->nwords - 1) * BITS_PER_BITMAPWORD;
+	result += bmw_leftmost_one_pos(a->words[a->nwords - 1]);
+	return result;
+}
+
 /*
  * bms_membership - does a set have zero, one, or multiple members?
  *
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 296dd75c1b..a63a3dfe00 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -354,6 +354,40 @@ set_base_rel_pathlists(PlannerInfo *root)
 	}
 }
 
+/*
+ * set_baserel_notnull_attrs
+ *
+ *	Set baserel's notnullattrs based on baserestrictinfo
+ */
+static void
+set_baserel_notnull_attrs(RelOptInfo *rel)
+{
+	List *clauses = extract_actual_clauses(rel->baserestrictinfo, false);
+	ListCell	*lc;
+	foreach(lc, find_nonnullable_vars((Node *)clauses))
+	{
+		Var *var = (Var *) lfirst(lc);
+		if (var->varno != rel->relid)
+		{
+			/* Lateral Join */
+			continue;
+		}
+		Assert(var->varno == rel->relid);
+		rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+											   var->varattno - FirstLowInvalidHeapAttributeNumber);
+	}
+
+	/* Debug Only, Will be removed at last. */
+	if (false)
+	{
+		elog(INFO, "FirstLowInvalidHeapAttributeNumber = %d, BaseRel(%d), notnull_attrs = %s",
+			 FirstLowInvalidHeapAttributeNumber,
+			 rel->relid,
+			 bmsToString(rel->notnull_attrs[0])
+			);
+	}
+}
+
 /*
  * set_rel_size
  *	  Set size estimates for a base relation
@@ -457,6 +491,8 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	set_baserel_notnull_attrs(rel);
+
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
 	 */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c5194fdbbf..7d3b40090e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int		i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -471,6 +472,14 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	for (i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+												   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 47769cea45..f017a7d52d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
 
 #include <limits.h>
 
+#include "access/sysattr.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
@@ -259,6 +260,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->all_partrels = NULL;
 	rel->partexprs = NULL;
 	rel->nullable_partexprs = NULL;
+	rel->notnull_attrs = palloc0(sizeof(Bitmapset *) * 1);
 
 	/*
 	 * Pass assorted information down the inheritance hierarchy.
@@ -557,6 +559,81 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
 	}
 }
 
+static void
+copy_notnull_attrs_to_joinrel(RelOptInfo *joinrel, RelOptInfo *rel)
+{
+	int relid;
+	if (bms_get_singleton_member(rel->relids, &relid))
+		joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[0]);
+	else
+	{
+		relid = -1;
+		while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+			joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[relid]);
+	}
+}
+
+/*
+ *
+ */
+static void
+set_joinrel_notnull_attrs(RelOptInfo *joinrel,
+						  RelOptInfo *outer_rel,
+						  RelOptInfo *inner_rel,
+						  List *restrictlist,
+						  SpecialJoinInfo *sjinfo)
+{
+	if (sjinfo->jointype == JOIN_FULL)
+		/* Both sides are nullable. */
+		return;
+	/* If it is not FULL join, the outer side is not changed. */
+	copy_notnull_attrs_to_joinrel(joinrel, outer_rel);
+	switch(sjinfo->jointype)
+	{
+		case JOIN_ANTI:
+		case JOIN_SEMI:
+		case JOIN_INNER:
+			copy_notnull_attrs_to_joinrel(joinrel, inner_rel);
+			{
+				ListCell	*lc;
+				List *clauses = extract_actual_clauses(restrictlist, false);
+				foreach(lc, find_nonnullable_vars((Node *) clauses))
+				{
+					Var *var = lfirst_node(Var, lc);
+					if (!bms_is_member(var->varno, joinrel->relids))
+					{
+						/* lateral join */
+						continue;
+					}
+					joinrel->notnull_attrs[var->varno] = bms_add_member(
+						joinrel->notnull_attrs[var->varno],
+						var->varattno - FirstLowInvalidHeapAttributeNumber);
+				}
+			}
+			break;
+		case JOIN_LEFT:
+			break;
+		default:
+			elog(ERROR, "Unexpected join type %d", sjinfo->jointype);
+	}
+	/* Debug Only, will be removed at last. */
+	if (false)
+	{
+		int relid = -1;
+		int eLevel = INFO;
+		elog(eLevel, "Dump notnull for JoinRel(%s)", bmsToString(joinrel->relids));
+		while((relid = bms_next_member(joinrel->relids, relid)) >= 0)
+		{
+			Bitmapset *notnullattrs = joinrel->notnull_attrs[relid];
+			if (notnullattrs != NULL)
+				elog(eLevel, "FirstLowInvalidHeapAttributeNumber = %d, RELID = (%d), notnull_attrs: %s",
+					 FirstLowInvalidHeapAttributeNumber,
+					 relid,
+					 bmsToString(notnullattrs));
+		}
+	}
+
+}
 /*
  * build_join_rel
  *	  Returns relation entry corresponding to the union of two given rels,
@@ -674,6 +751,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->all_partrels = NULL;
 	joinrel->partexprs = NULL;
 	joinrel->nullable_partexprs = NULL;
+	joinrel->notnull_attrs = palloc0(sizeof(Bitmapset *) * (bms_max_member(joinrel->relids) + 1));
 
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -765,6 +843,8 @@ build_join_rel(PlannerInfo *root,
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	set_joinrel_notnull_attrs(joinrel, outer_rel, inner_rel, restrictlist, sjinfo);
+
 	return joinrel;
 }
 
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 1fd12de698..303873a546 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -94,6 +94,7 @@ extern bool bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b);
 extern int	bms_singleton_member(const Bitmapset *a);
 extern bool bms_get_singleton_member(const Bitmapset *a, int *member);
 extern int	bms_num_members(const Bitmapset *a);
+extern int	bms_max_member(const Bitmapset *a);
 
 /* optimized tests when we don't need to know exact membership count: */
 extern BMS_Membership bms_membership(const Bitmapset *a);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6e068f2c8b..7bf1896e12 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -686,6 +686,12 @@ typedef struct RelOptInfo
 	/* default result targetlist for Paths scanning this relation */
 	struct PathTarget *reltarget;	/* list of Vars/Exprs, cost, width */
 
+	Bitmapset	**notnull_attrs; /* The attno which is not null after evalating
+								  * all the quals on this relation, for baserel,
+								  * the len would always 1. and for others the array
+								  * index is relid from relids.
+								  */
+
 	/* materialization information */
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
-- 
2.21.0

v3-0004-Support-UniqueKey-on-BaseRel.patchapplication/octet-stream; name=v3-0004-Support-UniqueKey-on-BaseRel.patchDownload
From ac411f30a1a30f58037234fffc7e88e6f2eb636c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 9 Aug 2021 19:10:35 +0800
Subject: [PATCH v3 4/6] Support UniqueKey on BaseRel.

---
 src/backend/optimizer/path/Makefile     |   3 +-
 src/backend/optimizer/path/allpaths.c   |   7 +-
 src/backend/optimizer/path/uniquekey.c  | 400 ++++++++++++++++++++++++
 src/backend/optimizer/plan/planner.c    |  10 +
 src/backend/optimizer/util/relnode.c    |   2 +
 src/include/nodes/nodes.h               |   3 +-
 src/include/nodes/pathnodes.h           |  20 ++
 src/include/optimizer/paths.h           |   4 +
 src/test/regress/expected/join.out      |  11 +-
 src/test/regress/expected/uniquekey.out |  82 +++++
 src/test/regress/parallel_schedule      |   2 +-
 src/test/regress/sql/uniquekey.sql      |  29 ++
 12 files changed, 562 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c
 create mode 100644 src/test/regress/expected/uniquekey.out
 create mode 100644 src/test/regress/sql/uniquekey.sql

diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a63a3dfe00..343253d694 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -396,6 +396,9 @@ static void
 set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			 Index rti, RangeTblEntry *rte)
 {
+	/* Set the notnull before the UniqueKey population. */
+	set_baserel_notnull_attrs(rel);
+
 	if (rel->reloptkind == RELOPT_BASEREL &&
 		relation_excluded_by_constraints(root, rel, rte))
 	{
@@ -491,7 +494,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
-	set_baserel_notnull_attrs(rel);
+
 
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
@@ -616,6 +619,8 @@ set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	 */
 	check_index_predicates(root, rel);
 
+	populate_baserel_uniquekeys(root, rel);
+
 	/* Mark rel with estimated output rows, width, etc */
 	set_baserel_size_estimates(root, rel);
 }
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..c93075656d
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,400 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for maintaining uniquekey.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/pathnodes.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/paths.h"
+
+
+/* Functions to populate UniqueKey */
+static bool add_uniquekey_for_uniqueindex(PlannerInfo *root,
+										  IndexOptInfo *unique_index,
+										  List *mergeable_const_peer,
+										  List *expr_opfamilies);
+
+/* UniqueKey is subset of .. */
+static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
+								  List *ecs, Relids relids);
+
+/* Avoid useless UniqueKey. */
+static bool unique_ecs_useful_for_distinct(PlannerInfo *root, List *ecs);
+static bool unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
+										  List *unique_ecs);
+/* Helper functions to create UniqueKey. */
+static UniqueKey *make_uniquekey(Bitmapset *unique_expr_indexes,
+								 bool multi_null,
+								 bool useful_for_distinct);
+static void mark_rel_singlerow(PlannerInfo *root, RelOptInfo *rel);
+
+/* Debug only */
+static void print_uniquekey(PlannerInfo *root, RelOptInfo *rel);
+
+/*
+ * populate_baserel_uniquekeys
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel)
+{
+	ListCell	*lc;
+	List	*mergeable_const_peer = NIL, *expr_opfamilies = NIL;
+
+	/*
+	 * ColX = {Const} AND ColY = {Const2} AND ColZ > {Const3},
+	 * gather ColX and ColY into mergeable_const_peer.
+	 */
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids))
+			mergeable_const_peer = lappend(mergeable_const_peer, get_rightop(rinfo->clause));
+		else if (bms_is_empty(rinfo->right_relids))
+			mergeable_const_peer = lappend(mergeable_const_peer, get_leftop(rinfo->clause));
+		else
+			continue;
+		expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies);
+	}
+
+	foreach(lc, rel->indexlist)
+	{
+		IndexOptInfo *index = (IndexOptInfo *)lfirst(lc);
+		if (!index->unique || !index->immediate ||
+			(index->indpred != NIL && !index->predOK))
+			continue;
+
+		if (add_uniquekey_for_uniqueindex(root, index,
+										  mergeable_const_peer,
+										  expr_opfamilies))
+			/* Find a singlerow case, no need to go through any more. */
+			return;
+	}
+
+	print_uniquekey(root, rel);
+}
+
+/*
+ * relation_is_distinct_for
+ *		Check if the relation is distinct for.
+ */
+bool
+relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *distinct_pathkey)
+{
+	ListCell	*lc;
+	List	*lecs = NIL;
+	Relids	relids = NULL;
+	foreach(lc, distinct_pathkey)
+	{
+		PathKey *pathkey = lfirst(lc);
+		lecs = lappend(lecs, pathkey->pk_eclass);
+		/*
+		 * Note that ec_relids doesn't include child member, but
+		 * distinct would not operate on childrel as well.
+		 */
+		relids = bms_union(relids, pathkey->pk_eclass->ec_relids);
+	}
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst(lc);
+		if (ukey->multi_nulls)
+			continue;
+
+		if (uniquekey_contains_in(root, ukey, lecs, relids))
+			return true;
+	}
+	return false;
+}
+
+/*
+ * add_uniquekey_for_uniqueindex
+ *	 populate a UniqueKey if necessary, return true iff the UniqueKey is an
+ * SingleRow.
+ */
+static bool
+add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index,
+							  List *mergeable_const_peer, List *expr_opfamilies)
+{
+	List	*unique_exprs = NIL, *unique_ecs = NIL;
+	ListCell	*indexpr_item;
+	int	c = 0;
+	RelOptInfo *rel = unique_index->rel;
+	bool	multinull = false;
+	bool	used_for_distinct = false;
+	Bitmapset *unique_exprs_index;
+
+	indexpr_item = list_head(unique_index->indexprs);
+	/* Gather all the non-const exprs */
+	for (c = 0; c < unique_index->nkeycolumns; c++)
+	{
+		int attr = unique_index->indexkeys[c];
+		Expr *expr;
+		bool	matched_const = false;
+		ListCell	*lc1, *lc2;
+		if (attr > 0)
+		{
+			Var *var;
+			expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr;
+			var = castNode(Var, expr);
+			Assert(IsA(expr, Var));
+			if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+							  rel->notnull_attrs[0]))
+				multinull = true;
+		}
+		else if (attr == 0)
+		{
+			/* Expression index */
+			expr = lfirst(indexpr_item);
+			indexpr_item = lnext(unique_index->indexprs, indexpr_item);
+			/* We can't grantee an FuncExpr will not return NULLs */
+			multinull = true;
+		}
+		else /* attr < 0 */
+		{
+			/* Index on OID is possible, not handle it for now. */
+			return false;
+		}
+
+		/*
+		 * Check index_col = Const case with regarding to opfamily checking
+		 * If so, we can remove the index_col from the final UniqueKey->exprs.
+		 */
+		forboth(lc1, mergeable_const_peer, lc2, expr_opfamilies)
+		{
+			if (list_member_oid((List *) lfirst(lc2), unique_index->opfamily[c]) &&
+				match_index_to_operand((Node *) lfirst(lc1), c, unique_index))
+			{
+				matched_const = true;
+				break;
+			}
+		}
+
+		if (matched_const)
+			continue;
+
+		unique_exprs = lappend(unique_exprs, expr);
+	}
+
+	if (unique_exprs == NIL)
+	{
+		/*
+		 * SingleRow case. Checking if it is useful is ineffective
+		 * so just keep it.
+		 */
+		mark_rel_singlerow(root, rel);
+		return true;
+	}
+
+	unique_ecs = build_equivalanceclass_list_for_exprs(root, unique_exprs, rel);
+
+	if (unique_ecs == NIL)
+	{
+		/* It is neither used in distinct_pathkey nor mergeable clause */
+		return false;
+	}
+
+	/*
+	 * Check if we need to setup the UniqueKey and set the used_for_distinct accordingly.
+	 */
+	if (unique_ecs_useful_for_distinct(root, unique_ecs))
+	{
+		used_for_distinct = true;
+	}
+	else if (!unique_ecs_useful_for_merging(root, rel, unique_ecs))
+		/*
+		 * Neither used in distinct pathkey nor used in mergeable clause.
+		 * this is possible even if unique_ecs != NIL.
+		 */
+		return false;
+	else
+	{
+		/*
+		 * unique_ecs_useful_for_merging(root, rel, unique_ecs) is true,
+		 * we did nothing in this case.
+		 */
+	}
+	unique_exprs_index = bms_make_singleton(list_length(root->unique_exprs));
+	root->unique_exprs = lappend(root->unique_exprs, unique_ecs);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  make_uniquekey(unique_exprs_index,
+											 multinull,
+											 used_for_distinct));
+	return false;
+}
+/*
+ * uniquekey_contains_in
+ *	Return if UniqueKey contains in the list of EquivalenceClass
+ * or the UniqueKey's SingleRow contains in relids.
+ *
+ */
+static bool
+uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey, List *ecs, Relids relids)
+{
+	int i = -1;
+	while ((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *exprs = list_nth(root->unique_exprs, i);
+		if (IsA(exprs, SingleRow))
+		{
+			SingleRow *singlerow = castNode(SingleRow, exprs);
+			if (!bms_is_member(singlerow->relid, relids))
+				/*
+				 * UniqueKey request a ANY expr on relid on the relid(which
+				 * indicates we don't have other EquivalenceClass for this
+				 * relation), but the relid doesn't contains in relids, which
+				 * indicate there is no such Expr in target, then we are sure
+				 * to return false.
+				 */
+				return false;
+			else
+			{
+				/*
+				 * We have SingleRow on relid, and the relid is in relids.
+				 * We don't need to check any more for this expr. This is
+				 * right for sure.
+				 */
+			}
+		}
+		else
+		{
+			Assert(IsA(exprs, List));
+			if (!list_is_subset_ptr((List *)exprs, ecs))
+				return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * unique_ecs_useful_for_distinct
+ *	return true if all the EquivalenceClass in ecs exists in root->distinct_pathkey.
+ */
+static bool
+unique_ecs_useful_for_distinct(PlannerInfo *root, List *ecs)
+{
+	ListCell *lc;
+	foreach(lc, ecs)
+	{
+		EquivalenceClass *ec = lfirst_node(EquivalenceClass, lc);
+		ListCell *p;
+		bool found = false;
+		foreach(p,  root->distinct_pathkeys)
+		{
+			PathKey *pathkey = lfirst_node(PathKey, p);
+			/*
+			 * Both of them should point to an element in root->eq_classes.
+			 * so the address should be same. and equal function doesn't
+			 * support EquivalenceClass yet.
+			 */
+			if (ec == pathkey->pk_eclass)
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * unique_ecs_useful_for_merging
+ *	return true if all the unique_ecs exists in rel's join restrictInfo.
+ */
+static bool
+unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *unique_ecs)
+{
+	ListCell	*lc;
+
+	foreach(lc, unique_ecs)
+	{
+		EquivalenceClass *ec = lfirst(lc);
+		if (!ec_useful_for_merging(root, rel, ec))
+			return false;
+	}
+
+	return true;
+}
+/*
+ *	make_uniquekey
+ */
+static UniqueKey *
+make_uniquekey(Bitmapset *unique_expr_indexes, bool multi_null, bool useful_for_distinct)
+{
+	UniqueKey *ukey = makeNode(UniqueKey);
+	ukey->unique_expr_indexes = unique_expr_indexes;
+	ukey->multi_nulls = multi_null;
+	ukey->use_for_distinct = useful_for_distinct;
+	return ukey;
+}
+
+/*
+ * mark_rel_singlerow
+ *	mark a relation as singlerow.
+ */
+static void
+mark_rel_singlerow(PlannerInfo *root, RelOptInfo *rel)
+{
+	int exprs_pos = list_length(root->unique_exprs);
+	Bitmapset *unique_exprs_index = bms_make_singleton(exprs_pos);
+	SingleRow *singlerow = makeNode(SingleRow);
+	singlerow->relid = rel->relid;
+	root->unique_exprs = lappend(root->unique_exprs, singlerow);
+	rel->uniquekeys = list_make1(make_uniquekey(unique_exprs_index,
+												false /* multi-null */,
+												true /* arbitrary decision */));
+}
+
+/*
+ * print_uniquekey
+ *	Used for easier reivew, should be removed before commit.
+ */
+static void
+print_uniquekey(PlannerInfo *root, RelOptInfo *rel)
+{
+	if (false)
+	{
+		ListCell	*lc;
+		elog(INFO, "Rel = %s", bmsToString(rel->relids));
+		foreach(lc, rel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			int i = -1;
+			elog(INFO, "UNIQUEKEY{indexes=%s, multinull=%d}",
+				 bmsToString(ukey->unique_expr_indexes),
+				 ukey->multi_nulls
+				);
+
+			while ((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+			{
+				Node *node = (Node *) list_nth(root->unique_exprs, i);
+				if (IsA(node, SingleRow))
+					elog(INFO,
+						 "Expr(%d) SingleRow{relid = %d}",
+						 i, castNode(SingleRow, node)->relid);
+				else
+					elog(INFO,
+						 "EC(%d), %s", i, nodeToString(node)
+						);
+			}
+		}
+	}
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2cd691191c..ed2cff00fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4238,6 +4238,16 @@ create_distinct_paths(PlannerInfo *root,
 	Path	   *path;
 	ListCell   *lc;
 
+	/*
+	 * distinct_pathkeys may be NIL if it distinctClause is sortable.
+	 * see standard_qp_callback. But for the efficiency of relation_is_distinct_for
+	 * we can't use distinctClause (rather than EC) here. Fortunately not sortable
+	 * clause is rare in real case.
+	 */
+	if (root->distinct_pathkeys &&
+		relation_is_distinct_for(root, input_rel, root->distinct_pathkeys))
+		return input_rel;
+
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f017a7d52d..b75e1679e6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -261,6 +261,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->partexprs = NULL;
 	rel->nullable_partexprs = NULL;
 	rel->notnull_attrs = palloc0(sizeof(Bitmapset *) * 1);
+	rel->uniquekeys = NIL;
 
 	/*
 	 * Pass assorted information down the inheritance hierarchy.
@@ -752,6 +753,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->partexprs = NULL;
 	joinrel->nullable_partexprs = NULL;
 	joinrel->notnull_attrs = palloc0(sizeof(Bitmapset *) * (bms_max_member(joinrel->relids) + 1));
+	joinrel->uniquekeys = NIL;
 
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..1bf220f373 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -280,7 +280,8 @@ typedef enum NodeTag
 	T_RollupData,
 	T_GroupingSetData,
 	T_StatisticExtInfo,
-
+	T_UniqueKey,
+	T_SingleRow,
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
 	 */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 7bf1896e12..3581b03a75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -246,6 +246,7 @@ struct PlannerInfo
 									 * subquery outputs */
 
 	List	   *eq_classes;		/* list of active EquivalenceClasses */
+	List	   *unique_exprs;		/* List of unique expr */
 
 	bool		ec_merging_done;	/* set true once ECs are canonical */
 
@@ -691,6 +692,7 @@ typedef struct RelOptInfo
 								  * the len would always 1. and for others the array
 								  * index is relid from relids.
 								  */
+	List		*uniquekeys; /* A list of UniqueKey. */
 
 	/* materialization information */
 	List	   *pathlist;		/* Path structures */
@@ -1070,6 +1072,24 @@ typedef struct PathKey
 	bool		pk_nulls_first; /* do NULLs come before normal values? */
 } PathKey;
 
+
+typedef struct UnqiueKey
+{
+	NodeTag	type;
+	Bitmapset	*unique_expr_indexes;
+	bool	multi_nulls;
+	bool	use_for_distinct;  /* true if it is used in distinct-pathkey, in this case
+								* we would never check if we should discard it during
+								* join search.
+								*/
+} UniqueKey;
+
+typedef struct SingleRow
+{
+	NodeTag	type;
+	Index		relid;
+} SingleRow;
+
 /*
  * VolatileFunctionStatus -- allows nodes to cache their
  * contain_volatile_functions properties. VOLATILITY_UNKNOWN means not yet
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index e813d82483..68b8b40ca9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -263,4 +263,8 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel);
+extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
+									 List *distinct_pathkey);
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f3589d0dbb..db72374302 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4612,18 +4612,15 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+             QUERY PLAN             
+------------------------------------
  Merge Right Join
    Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   ->  Index Scan using b_pkey on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(6 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/uniquekey.out b/src/test/regress/expected/uniquekey.out
new file mode 100644
index 0000000000..d9a8634e80
--- /dev/null
+++ b/src/test/regress/expected/uniquekey.out
@@ -0,0 +1,82 @@
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3), (4, 4, null, 4), (5, 5, null, 4);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table primary key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+-- Test EC case.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE d = pk;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (d = pk)
+(2 rows)
+
+-- Test UniqueKey indexes.
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+-- Test not null quals and not null per catalog.
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+       QUERY PLAN       
+------------------------
+ HashAggregate
+   Group Key: c, d
+   ->  Seq Scan on uqk1
+(3 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+          QUERY PLAN           
+-------------------------------
+ Unique
+   ->  Sort
+         Sort Key: d
+         ->  Seq Scan on uqk1
+               Filter: (c = 1)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+-- Test UniqueKey column reduction.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+    QUERY PLAN     
+-------------------
+ Seq Scan on uqk1
+   Filter: (c = 1)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE c = 1 and d = 1;
+           QUERY PLAN            
+---------------------------------
+ Seq Scan on uqk1
+   Filter: ((c = 1) AND (d = 1))
+(2 rows)
+
+-- Test Distinct ON
+EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 7be89178f0..41454448f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -72,7 +72,7 @@ test: sanity_check
 # ----------
 ignore: random
 test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update delete namespace prepared_xacts
-
+test: uniquekey
 # ----------
 # Another group of parallel tests
 # ----------
diff --git a/src/test/regress/sql/uniquekey.sql b/src/test/regress/sql/uniquekey.sql
new file mode 100644
index 0000000000..a1b538d1c1
--- /dev/null
+++ b/src/test/regress/sql/uniquekey.sql
@@ -0,0 +1,29 @@
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3), (4, 4, null, 4), (5, 5, null, 4);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table primary key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+
+-- Test EC case.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE d = pk;
+
+-- Test UniqueKey indexes.
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+-- Test not null quals and not null per catalog.
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+
+-- Test UniqueKey column reduction.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE c = 1 and d = 1;
+
+-- Test Distinct ON
+EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
-- 
2.21.0

v3-0006-Maintain-the-UniqueKey-on-Subquery-and-UpperRel-l.patchapplication/octet-stream; name=v3-0006-Maintain-the-UniqueKey-on-Subquery-and-UpperRel-l.patchDownload
From e5e97f0006d9292632a32b8f53d639ec7ecfa298 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 9 Aug 2021 19:46:13 +0800
Subject: [PATCH v3 6/6] Maintain the UniqueKey on Subquery and UpperRel level.

---
 src/backend/optimizer/path/allpaths.c   |   2 +
 src/backend/optimizer/path/pathkeys.c   |   3 +-
 src/backend/optimizer/path/uniquekey.c  | 179 ++++++++++++++++++++++++
 src/backend/optimizer/plan/planner.c    |  17 ++-
 src/include/optimizer/paths.h           |   6 +
 src/test/regress/expected/uniquekey.out |  36 +++++
 src/test/regress/sql/uniquekey.sql      |  21 +++
 7 files changed, 261 insertions(+), 3 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 343253d694..9225bf738e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2288,6 +2288,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		return;
 	}
 
+	populate_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/*
 	 * Mark rel with estimated output rows, width, etc.  Note that we have to
 	 * do this before generating outer-query paths, else cost_subqueryscan is
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 21f4f2d4bc..9a8a5b654c 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -33,7 +33,6 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
 											 int partkeycol);
-static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
@@ -1035,7 +1034,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
  * We need this to ensure that we don't return pathkeys describing values
  * that are unavailable above the level of the subquery scan.
  */
-static Var *
+Var *
 find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
index badc843253..3bc0b8f3cc 100644
--- a/src/backend/optimizer/path/uniquekey.c
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -47,6 +47,13 @@ static void populate_joinrel_composited_uniquekey(PlannerInfo *root,
 												  bool outeruk_still_valid,
 												  bool inneruk_still_valid);
 
+static void convert_subquery_uniquekey(PlannerInfo *root, RelOptInfo *rel, UniqueKey *sub_ukey);
+static EquivalenceClass * find_outer_ec_with_subquery_em(PlannerInfo *root, RelOptInfo *rel,
+														 EquivalenceClass *sub_ec,
+														 EquivalenceMember *sub_em);
+static List *convert_subquery_eclass_list(PlannerInfo *root, RelOptInfo *rel,
+										  List *sub_eclass_list);
+
 /* UniqueKey is subset of .. */
 static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
 								  List *ecs, Relids relids);
@@ -185,6 +192,57 @@ populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
 	return;
 }
 
+/*
+ * populate_subquery_uniquekeys
+ *
+ * 'rel': outer query's RelOptInfo for the subquery relation.
+ * 'subquery_uniquekeys': the subquery's output pathkeys, in its terms.
+ * 'subquery_tlist': the subquery's output targetlist, in its terms.
+ *
+ *  subquery issues: a). tlist mapping.  b). interesting uniquekey. c). not nulls.
+ */
+void
+populate_subquery_uniquekeys(PlannerInfo *root, RelOptInfo *rel, RelOptInfo *sub_final_rel)
+{
+	List	*sub_uniquekeys = sub_final_rel->uniquekeys;
+	ListCell	*lc;
+	foreach(lc, sub_uniquekeys)
+	{
+		UniqueKey *sub_ukey = lfirst_node(UniqueKey, lc);
+		convert_subquery_uniquekey(root, rel, sub_ukey);
+	}
+}
+
+/*
+ * populate_uniquekeys_from_pathkeys
+ *
+ */
+void
+populate_uniquekeys_from_pathkeys(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
+{
+	ListCell *lc;
+	List	*unique_exprs = NIL;
+	if (pathkeys == NIL)
+		return;
+	foreach(lc, pathkeys)
+	{
+		PathKey *pathkey = lfirst(lc);
+		unique_exprs = lappend(unique_exprs, pathkey->pk_eclass);
+	}
+	rel->uniquekeys = list_make1(
+		make_uniquekey(bms_make_singleton(list_length(root->unique_exprs)),
+					   false,
+					   true));
+	root->unique_exprs = lappend(root->unique_exprs, unique_exprs);
+}
+
+
+void
+simple_copy_uniquekeys(RelOptInfo *tarrel, RelOptInfo *srcrel)
+{
+	tarrel->uniquekeys = srcrel->uniquekeys;
+}
+
 /*
  * relation_is_distinct_for
  *		Check if the relation is distinct for.
@@ -710,6 +768,127 @@ is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey *ukey,
 	return true;
 }
 
+/*
+ * find_outer_ec_with_subquery_em
+ *
+ *	Given a em in subquery, return the related EquivalenceClass outside.
+ */
+static EquivalenceClass *
+find_outer_ec_with_subquery_em(PlannerInfo *root, RelOptInfo *rel,
+							   EquivalenceClass *sub_ec, EquivalenceMember *sub_em)
+{
+	TargetEntry *sub_tle;
+	Var *outer_var;
+	EquivalenceClass *outer_ec;
+
+	sub_tle = get_tle_from_expr(sub_em->em_expr, rel->subroot->processed_tlist);
+
+	if (!sub_tle)
+		return NULL;
+
+	outer_var = find_var_for_subquery_tle(rel, sub_tle);
+	if (!outer_var)
+		return NULL;
+
+	outer_ec = get_eclass_for_sort_expr(root,
+										(Expr *)outer_var,
+										NULL,
+										sub_ec->ec_opfamilies,
+										sub_em->em_datatype,
+										sub_ec->ec_collation,
+										0,
+										rel->relids,
+										false);
+	return outer_ec;
+}
+
+
+/*
+ * convert_subquery_eclass_list
+ *
+ *		Given a list of eclass in subquery, find the corresponding eclass in outer side.
+ * return NULL if no related eclass outside is found for any eclass in subquery.
+ */
+static List *
+convert_subquery_eclass_list(PlannerInfo *root, RelOptInfo *rel, List *sub_eclass_list)
+{
+	ListCell	*lc;
+	List	*ec_list = NIL;
+	foreach(lc, sub_eclass_list)
+	{
+		EquivalenceClass *sub_ec = lfirst_node(EquivalenceClass, lc);
+		EquivalenceClass *ec = NULL;
+		ListCell	*emc;
+		foreach(emc, sub_ec->ec_members)
+		{
+			EquivalenceMember *sub_em = lfirst(emc);
+			if ((ec = find_outer_ec_with_subquery_em(root, rel, sub_ec, sub_em)) != NULL)
+				break;
+		}
+		if (!ec)
+			return NIL;
+		ec_list = lappend(ec_list, ec);
+	}
+	return ec_list;
+}
+
+
+/*
+ * convert_subquery_uniquekey
+ *
+ */
+static void
+convert_subquery_uniquekey(PlannerInfo *root, RelOptInfo *rel, UniqueKey *sub_ukey)
+{
+	PlannerInfo *sub_root = rel->subroot;
+	List	*unique_exprs_list = NIL;
+	Bitmapset	*unique_exprs_indexes = NULL;
+	UniqueKey	*ukey = NULL;
+	int i = -1;
+	ListCell	*lc;
+	while((i = bms_next_member(sub_ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *sub_eq_list = list_nth(sub_root->unique_exprs, i);
+		if (IsA(sub_eq_list, SingleRow))
+		{
+			/*
+			 * TODO: Unclear what to do, don't think it hard before the overall
+			 * design is accepted.
+			 */
+			return;
+		}
+		else
+		{
+			List *upper_eq_list;
+			Assert(IsA(sub_eq_list, List));
+			/*
+			 * Note: upper_eq_list is just part of uniquekey's exprs, to covert the whole
+			 * UniqueKey, we needs all the parts are shown in the upper rel.
+			 */
+			upper_eq_list = convert_subquery_eclass_list(root, rel, (List *)sub_eq_list);
+			if (upper_eq_list == NIL)
+			{
+				if (unique_exprs_list != NIL)
+					pfree(unique_exprs_list);
+				return;
+			}
+			unique_exprs_list = lappend(unique_exprs_list, upper_eq_list);
+		}
+	}
+
+	foreach(lc, unique_exprs_list)
+	{
+		unique_exprs_indexes = bms_add_member(unique_exprs_indexes, list_length(root->unique_exprs));
+		root->unique_exprs = lappend(root->unique_exprs, lfirst(lc));
+	}
+
+	ukey = make_uniquekey(unique_exprs_indexes,
+						  sub_ukey->multi_nulls,
+						  /* TODO: need check again, case SELECT * FROM (SELECT u FROM x OFFSET 0) v where x.u = 0; */
+						  true);
+	rel->uniquekeys = lappend(rel->uniquekeys, ukey);
+}
+
 /*
  *	make_uniquekey
  */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ed2cff00fc..a268737a4c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1650,7 +1650,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	 * Now we are prepared to build the final-output upperrel.
 	 */
 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
-
+	simple_copy_uniquekeys(final_rel, current_rel);
 	/*
 	 * If the input rel is marked consider_parallel and there's nothing that's
 	 * not parallel-safe in the LIMIT clause, then the final_rel can be marked
@@ -3622,6 +3622,19 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 									  gd,
 									  extra->targetList);
 
+	if (root->parse->groupingSets)
+	{
+		/* nothing to do */
+	}
+	else if (root->parse->groupClause && root->group_pathkeys != NIL)
+	{
+		populate_uniquekeys_from_pathkeys(root, grouped_rel, root->group_pathkeys);
+	}
+	else
+	{
+		/* SingleRow Case */
+	}
+
 	/* Build final grouping paths */
 	add_paths_to_grouping_rel(root, input_rel, grouped_rel,
 							  partially_grouped_rel, agg_costs, gd,
@@ -4251,6 +4264,8 @@ create_distinct_paths(PlannerInfo *root,
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
 
+	populate_uniquekeys_from_pathkeys(root, distinct_rel, root->distinct_pathkeys);
+
 	/*
 	 * We don't compute anything at this level, so distinct_rel will be
 	 * parallel-safe if the input rel is parallel-safe.  In particular, if
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index f233837e59..b9570807af 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -265,11 +265,17 @@ extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 extern void populate_baserel_uniquekeys(PlannerInfo *root,
 										RelOptInfo *baserel);
+extern Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 extern void populate_baserel_uniquekeys(PlannerInfo *root,
 										RelOptInfo *baserel);
 extern void populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
 										RelOptInfo *outerrel, RelOptInfo *innerrel,
 										List *restrictlist, JoinType jointype);
+extern void populate_uniquekeys_from_pathkeys(PlannerInfo *root, RelOptInfo *rel,
+											  List *pathkeys);
+extern void populate_subquery_uniquekeys(PlannerInfo *root, RelOptInfo *rel,
+										 RelOptInfo *sub_final_rel);
+extern void simple_copy_uniquekeys(RelOptInfo *srcrel, RelOptInfo *tarrel);
 extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 									 List *distinct_pathkey);
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/uniquekey.out b/src/test/regress/expected/uniquekey.out
index c2bd1fa619..a86e413fcf 100644
--- a/src/test/regress/expected/uniquekey.out
+++ b/src/test/regress/expected/uniquekey.out
@@ -407,3 +407,39 @@ EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND
                ->  Seq Scan on uqk2
 (8 rows)
 
+-----------------------------------------
+-- Test DISTINCT/GROUP BY CASE.
+-----------------------------------------
+--------------------------------------------------------------------------------------------
+-- Test subquery cases.
+-- Note that current the UniqueKey still not push down the interesting UniqueKey to subquery.
+-- like uniquekey, so the below test case need a "DISTINCT" in subquery to make sure the
+-- UniqueKey is maintain.
+--------------------------------------------------------------------------------------------
+-- Test a normal case - one side
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk AND uqk1.c is not null offset 0) v;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Join  (cost=1.07..2.14 rows=2 width=8)
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1  (cost=0.00..1.05 rows=3 width=12)
+         Filter: (c IS NOT NULL)
+   ->  Hash  (cost=1.03..1.03 rows=3 width=4)
+         ->  Seq Scan on uqk2  (cost=0.00..1.03 rows=3 width=4)
+(6 rows)
+
+-- Test a normal case - composited side.
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null OFFSET 0)
+v;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.20 rows=9 width=12)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=3 width=8)
+         Filter: (c IS NOT NULL)
+   ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+(5 rows)
+
diff --git a/src/test/regress/sql/uniquekey.sql b/src/test/regress/sql/uniquekey.sql
index 3f93872246..22e236eafb 100644
--- a/src/test/regress/sql/uniquekey.sql
+++ b/src/test/regress/sql/uniquekey.sql
@@ -106,3 +106,24 @@ SELECT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY
 -----------------------------------------
 EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1, uqk2 WHERE uqk1.c = uqk2.c;
 EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.c;
+
+-----------------------------------------
+-- Test DISTINCT/GROUP BY CASE.
+-----------------------------------------
+
+
+--------------------------------------------------------------------------------------------
+-- Test subquery cases.
+-- Note that current the UniqueKey still not push down the interesting UniqueKey to subquery.
+-- like uniquekey, so the below test case need a "DISTINCT" in subquery to make sure the
+-- UniqueKey is maintain.
+--------------------------------------------------------------------------------------------
+-- Test a normal case - one side
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk AND uqk1.c is not null offset 0) v;
+
+-- Test a normal case - composited side.
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null OFFSET 0)
+v;
-- 
2.21.0

v3-0001-Just-refactor-pathkeys_useful_for_merging-split-a.patchapplication/octet-stream; name=v3-0001-Just-refactor-pathkeys_useful_for_merging-split-a.patchDownload
From 32886c5b14709d1f7ad27e5e024c3e083b97026c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 25 Jul 2021 16:28:30 +0800
Subject: [PATCH v3 1/6] Just refactor pathkeys_useful_for_merging, split a new
 function

ec_useful_for_merging which is useful for UniqueKey as well.
---
 src/backend/optimizer/path/equivclass.c | 42 +++++++++++++++++++++++++
 src/backend/optimizer/path/pathkeys.c   | 33 +------------------
 src/include/optimizer/paths.h           |  2 ++
 3 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 6f1abbe47d..2f688807af 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -960,6 +960,48 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	return NULL;
 }
 
+/*
+ * ec_useful_for_merging
+ *	check if the ec exists in rel's merageable restrictinfo_lists.
+ */
+bool
+ec_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
+					  EquivalenceClass *ec)
+{
+	/*
+	 * First look into the EquivalenceClass to see if there are any members
+	 * not yet joined to the rel.
+	 */
+	if (rel->has_eclass_joins &&
+		eclass_useful_for_merging(root, ec, rel))
+		return true;
+	else
+	{
+		/*
+		 * Otherwise search the rel's joininfo list, which contains
+		 * non-EquivalenceClass-derivable join clauses that might
+		 * nonetheless be mergejoinable.
+		 */
+		ListCell	*j;
+		foreach(j, rel->joininfo)
+		{
+			RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(j);
+
+			if (restrictinfo->mergeopfamilies == NIL)
+				continue;
+			update_mergeclause_eclasses(root, restrictinfo);
+
+			if (ec == restrictinfo->left_ec ||
+			    ec == restrictinfo->right_ec)
+			{
+				return true;
+			}
+		}
+	}
+
+	return false;
+}
+
 /*
  * relation_can_be_sorted_early
  *		Can this relation be sorted on this EC before the final output step?
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 216dd26385..21f4f2d4bc 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1751,43 +1751,12 @@ pathkeys_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
 	{
 		PathKey    *pathkey = (PathKey *) lfirst(i);
 		bool		matched = false;
-		ListCell   *j;
 
 		/* If "wrong" direction, not useful for merging */
 		if (!right_merge_direction(root, pathkey))
 			break;
 
-		/*
-		 * First look into the EquivalenceClass of the pathkey, to see if
-		 * there are any members not yet joined to the rel.  If so, it's
-		 * surely possible to generate a mergejoin clause using them.
-		 */
-		if (rel->has_eclass_joins &&
-			eclass_useful_for_merging(root, pathkey->pk_eclass, rel))
-			matched = true;
-		else
-		{
-			/*
-			 * Otherwise search the rel's joininfo list, which contains
-			 * non-EquivalenceClass-derivable join clauses that might
-			 * nonetheless be mergejoinable.
-			 */
-			foreach(j, rel->joininfo)
-			{
-				RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(j);
-
-				if (restrictinfo->mergeopfamilies == NIL)
-					continue;
-				update_mergeclause_eclasses(root, restrictinfo);
-
-				if (pathkey->pk_eclass == restrictinfo->left_ec ||
-					pathkey->pk_eclass == restrictinfo->right_ec)
-				{
-					matched = true;
-					break;
-				}
-			}
-		}
+		matched = ec_useful_for_merging(root, rel, pathkey->pk_eclass);
 
 		/*
 		 * If we didn't find a mergeclause, we're done --- any additional
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index f1d111063c..a29f616423 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -143,6 +143,8 @@ extern EquivalenceMember *find_computable_ec_member(PlannerInfo *root,
 													List *exprs,
 													Relids relids,
 													bool require_parallel_safe);
+extern bool ec_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
+								  EquivalenceClass *ec);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern bool relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
 										 EquivalenceClass *ec,
-- 
2.21.0

#32Zhihong Yu
zyu@yugabyte.com
In reply to: Andy Fan (#31)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Sun, Aug 15, 2021 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

I have finished the parts for baserel, joinrel, subquery, distinctrel. I
think
the hardest ones have been verified. Here is the design overview.

1. Use EC instead of expr to cover more UniqueKey cases.
2. Redesign the UniqueKey as below:

@@ -246,6 +246,7 @@ struct PlannerInfo

List *eq_classes; /* list of active EquivalenceClasses */
+ List *unique_exprs; /* List of unique expr */

bool ec_merging_done; /* set true once ECs are canonical */

+typedef struct UniqueKey
+{
+ NodeTag type;
+ Bitmapset *unique_expr_indexes;
+ bool multi_nulls;
+} UniqueKey;
+

PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set
of
EquivalenceClass. for example:

CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary
key);
CREATE UNIQUE INDEX ON t1(a, b);

SELECT DISTINCT * FROM T1 WHERE a = c;

Then we would have PlannerInfo.unique_exprs as below
[
[EC(a, c), EC(b)],
[EC(pk)]
]

RelOptInfo(t1) would have 2 UniqueKeys.
UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]

The design will benefit many table joins cases. For instance a 10- tables
join,
each table has a primary key (a, b). Then we would have a UniqueKey like
this.

JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
JoinRel{1,2,3,4,5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a
t5.b}

When more tables are joined, the list would be longer and longer, build
the list
consumes both CPU cycles and memory.

With the method as above, we can store it as:

root->unique_exprs = /* All the UniqueKey is stored once */
[
[t1.a, t1.b], -- EC is ignored in document.
[t2.a, t2.b],
[t3.a, t3.b],
[t4.a, t4.b],
[t5.a, t5.b],
[t6.a, t6.b],
[t7.a, t7.b],
[t8.a, t8.b],
[t9.a, t9.b],
[t10.a, t10.b],
]

JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword.
JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword.

The member in the bitmap is the index of root->unique_exprs rather than
root->eq_classes because we need to store the SingleRow case in
root->unqiue_exprs as well.

3. Define a new SingleRow node and use it in joinrel as well.

+typedef struct SingleRow
+{
+ NodeTag type;
+ Index relid;
+} SingleRow;

SELECT * FROM t1, t2 WHERE t2.pk = 1;

root->unique_exprs
[
[t1.a, t1.b],
SingleRow{relid=2}
]

JoinRel{t1} - Bitmapset{0}
JoinRel{t2} - Bitmapset{1}

JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to
dedicated
exprs.

4. Interesting UniqueKey to remove the Useless UniqueKey as soon as
possible.

The overall idea is similar with PathKey, I distinguish the UniqueKey for
distinct purpose and useful_for_merging purpose.

SELECT DISTINCT pk FROM t; -- OK, maintain it all the time, just like
root->query_pathkey.

SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK
is
use before t1 join t2, but not useful after it.

Currently the known issue I didn't pass the "interesting UniqueKey" info to
subquery well [2], I'd like to talk more about this when we discuss about
UnqiueKey on subquery part.

5. relation_is_distinct_for

Now I design the function as

+ bool
+ relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
*distinct_pathkey)

It is "List *distinct_pathkey", rather than "List *exprs". The reason
pathkey
has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
checking is very effective. As for the distinct/group as no-op case, we
have
pathkey all the time. The only drawback of it is some clauses are
not-sortable,
in this case, the root->distinct_pathkey and root->group_pathkey is not
set. However it should be rare by practice, so I ignore this part for
now. After all, I can have a relation_is_disticnt_for version for Exprs. I
just
not implemented it so far.

6. EC overhead in UnqiueKey & UNION case.

Until now I didn't create any new EC for the UniqueKey case, I just used
the
existing ones. However I can't handle the case like

SELECT a, b FROM t1
UNION
SELECT x, y FROM t2;

In this case, there is no EC created with existing code. and I don't want
to
create them for the UniqueKey case as well. so my plan is just not to
handle
the case for UNION.

Since we need some big effort from the reviewer, I split the patch into
many
smaller chunks.

Patch 1 / Patch 2: I just split some code which UniqueKey uses but not
very
interrelated. Splitting them out to reduce the core patch size.
Patch 3: still the notnull stuff. This one doesn't play a big role
overall,
even if the design is changed at last, we can just modify some small
stuff. IMO,
I don't think it is a blocker issue to move on.
Patch 4: Support the UniqueKey for baserel.
Patch 5: Support the UniqueKey for joinrel.
Patch 6: Support the UniqueKey for some upper relation, like distinctrel,
groupby rel.

I'd suggest moving on like this:
1. Have an overall review to see if any outstanding issues the patch has.
2. If not, we can review and commit patch 1 & patch 2 to reduce the patch
size.
3. Decide which method to take for not null stuff. IMO Tom's method
would be a bit
complex and the benefit is not very clear to me[1]. So the choices
include: a). move on UniqueKey stuff until Tom's method is ready. b).
Move
on the UniqueKey with my notnull way, and changes to Tom's method when
necessary. I prefer method b).
4. Review & Commit the UniqueKey for BaseRel part.
5. Review & Commit the UniqueKey for JoinRel part.
6. Review & Commit the UniqueKey for SubQuery part *without* the
Interesting
UniqueKey well handled.
7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting
UniqueKey well handled.
8. Discuss about the UniqueKey on partitioned rel case and
develop/review/commit
it.
9. Apply UniqueKey stuff on more user cases rather than DISTINCT.

What do you think about this?

[1]
/messages/by-id/CAApHDvo5b2pYX+dbPy+ysGBSarezRSfXthX32TZNFm0wPdfKGQ@mail.gmail.com
[2]
/messages/by-id/CAKU4AWo6-=9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN=+hb_w@mail.gmail.com

Thanks

Hi,
For v3-0005-Support-UniqueKey-on-JoinRel.patch :

+static void populate_joinrel_composited_uniquekey(PlannerInfo *root,

populate_joinrel_composited_uniquekey
-> populate_joinrel_composite_uniquekey (without the trailing d for
composite)

For populate_joinrel_uniquekeys():

+       foreach(lc, outerrel->uniquekeys)
+       {
...
+           return;

Should the remaining unique keys be considered ?

For populate_joinrel_uniquekey_for_rel():

+       else if (bms_equal(r->right_relids, rel->relids) && r->left_ec !=
NULL)
+       {
+           other_ecs = lappend(other_ecs, r->right_ec);
+           other_relids = bms_add_members(other_relids, r->left_relids);

It seems the append to other_ecs is the same as the one for the
`bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL` case. Is
this correct ?

Cheers

#33Andy Fan
zhihui.fan1213@gmail.com
In reply to: Zhihong Yu (#32)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Hi Zhihong,

On Mon, Aug 16, 2021 at 12:35 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Sun, Aug 15, 2021 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:

I have finished the parts for baserel, joinrel, subquery, distinctrel. I think
the hardest ones have been verified. Here is the design overview.

1. Use EC instead of expr to cover more UniqueKey cases.
2. Redesign the UniqueKey as below:

@@ -246,6 +246,7 @@ struct PlannerInfo

List *eq_classes; /* list of active EquivalenceClasses */
+ List *unique_exprs; /* List of unique expr */

bool ec_merging_done; /* set true once ECs are canonical */

+typedef struct UniqueKey
+{
+ NodeTag type;
+ Bitmapset *unique_expr_indexes;
+ bool multi_nulls;
+} UniqueKey;
+

PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of
EquivalenceClass. for example:

CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key);
CREATE UNIQUE INDEX ON t1(a, b);

SELECT DISTINCT * FROM T1 WHERE a = c;

Then we would have PlannerInfo.unique_exprs as below
[
[EC(a, c), EC(b)],
[EC(pk)]
]

RelOptInfo(t1) would have 2 UniqueKeys.
UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]

The design will benefit many table joins cases. For instance a 10- tables join,
each table has a primary key (a, b). Then we would have a UniqueKey like
this.

JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
JoinRel{1,2,3,4,5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b}

When more tables are joined, the list would be longer and longer, build the list
consumes both CPU cycles and memory.

With the method as above, we can store it as:

root->unique_exprs = /* All the UniqueKey is stored once */
[
[t1.a, t1.b], -- EC is ignored in document.
[t2.a, t2.b],
[t3.a, t3.b],
[t4.a, t4.b],
[t5.a, t5.b],
[t6.a, t6.b],
[t7.a, t7.b],
[t8.a, t8.b],
[t9.a, t9.b],
[t10.a, t10.b],
]

JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword.
JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword.

The member in the bitmap is the index of root->unique_exprs rather than
root->eq_classes because we need to store the SingleRow case in
root->unqiue_exprs as well.

3. Define a new SingleRow node and use it in joinrel as well.

+typedef struct SingleRow
+{
+ NodeTag type;
+ Index relid;
+} SingleRow;

SELECT * FROM t1, t2 WHERE t2.pk = 1;

root->unique_exprs
[
[t1.a, t1.b],
SingleRow{relid=2}
]

JoinRel{t1} - Bitmapset{0}
JoinRel{t2} - Bitmapset{1}

JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated
exprs.

4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible.

The overall idea is similar with PathKey, I distinguish the UniqueKey for
distinct purpose and useful_for_merging purpose.

SELECT DISTINCT pk FROM t; -- OK, maintain it all the time, just like
root->query_pathkey.

SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is
use before t1 join t2, but not useful after it.

Currently the known issue I didn't pass the "interesting UniqueKey" info to
subquery well [2], I'd like to talk more about this when we discuss about
UnqiueKey on subquery part.

5. relation_is_distinct_for

Now I design the function as

+ bool
+ relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
*distinct_pathkey)

It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey
has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
checking is very effective. As for the distinct/group as no-op case, we have
pathkey all the time. The only drawback of it is some clauses are not-sortable,
in this case, the root->distinct_pathkey and root->group_pathkey is not
set. However it should be rare by practice, so I ignore this part for
now. After all, I can have a relation_is_disticnt_for version for Exprs. I just
not implemented it so far.

6. EC overhead in UnqiueKey & UNION case.

Until now I didn't create any new EC for the UniqueKey case, I just used the
existing ones. However I can't handle the case like

SELECT a, b FROM t1
UNION
SELECT x, y FROM t2;

In this case, there is no EC created with existing code. and I don't want to
create them for the UniqueKey case as well. so my plan is just not to handle
the case for UNION.

Since we need some big effort from the reviewer, I split the patch into many
smaller chunks.

Patch 1 / Patch 2: I just split some code which UniqueKey uses but not very
interrelated. Splitting them out to reduce the core patch size.
Patch 3: still the notnull stuff. This one doesn't play a big role overall,
even if the design is changed at last, we can just modify some small stuff. IMO,
I don't think it is a blocker issue to move on.
Patch 4: Support the UniqueKey for baserel.
Patch 5: Support the UniqueKey for joinrel.
Patch 6: Support the UniqueKey for some upper relation, like distinctrel,
groupby rel.

I'd suggest moving on like this:
1. Have an overall review to see if any outstanding issues the patch has.
2. If not, we can review and commit patch 1 & patch 2 to reduce the patch size.
3. Decide which method to take for not null stuff. IMO Tom's method
would be a bit
complex and the benefit is not very clear to me[1]. So the choices
include: a). move on UniqueKey stuff until Tom's method is ready. b). Move
on the UniqueKey with my notnull way, and changes to Tom's method when
necessary. I prefer method b).
4. Review & Commit the UniqueKey for BaseRel part.
5. Review & Commit the UniqueKey for JoinRel part.
6. Review & Commit the UniqueKey for SubQuery part *without* the Interesting
UniqueKey well handled.
7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting
UniqueKey well handled.
8. Discuss about the UniqueKey on partitioned rel case and develop/review/commit
it.
9. Apply UniqueKey stuff on more user cases rather than DISTINCT.

What do you think about this?

[1] /messages/by-id/CAApHDvo5b2pYX+dbPy+ysGBSarezRSfXthX32TZNFm0wPdfKGQ@mail.gmail.com
[2] /messages/by-id/CAKU4AWo6-=9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN=+hb_w@mail.gmail.com

Thanks

Hi,
For v3-0005-Support-UniqueKey-on-JoinRel.patch :

+static void populate_joinrel_composited_uniquekey(PlannerInfo *root,

populate_joinrel_composited_uniquekey -> populate_joinrel_composite_uniquekey (without the trailing d for composite)

For populate_joinrel_uniquekeys():

+       foreach(lc, outerrel->uniquekeys)
+       {
...
+           return;

Should the remaining unique keys be considered ?

For populate_joinrel_uniquekey_for_rel():

+       else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != NULL)
+       {
+           other_ecs = lappend(other_ecs, r->right_ec);
+           other_relids = bms_add_members(other_relids, r->left_relids);

It seems the append to other_ecs is the same as the one for the `bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL` case. Is this correct ?

Correct, both of them are bugs, I will fix them in the next version,
including the "tailing d".
Thanks for your review!

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#34Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#33)
6 attachment(s)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Hi:

patch v4 fixed the 2 bugs Zhihong reported.

Any feedback is welcome.

Attachments:

v4-0005-Support-UniqueKey-on-JoinRel.patchapplication/octet-stream; name=v4-0005-Support-UniqueKey-on-JoinRel.patchDownload
From e1940cc02e97da2e84fd062feecc0a38bdbedd4e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 9 Aug 2021 19:43:58 +0800
Subject: [PATCH v4 5/6] Support UniqueKey on JoinRel.

---
 src/backend/optimizer/path/uniquekey.c  | 379 ++++++++++++++++++++++++
 src/backend/optimizer/util/relnode.c    |   2 +
 src/include/optimizer/paths.h           |   5 +
 src/test/regress/expected/uniquekey.out | 327 ++++++++++++++++++++
 src/test/regress/sql/uniquekey.sql      |  79 +++++
 5 files changed, 792 insertions(+)

diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
index c93075656d..815ade02c3 100644
--- a/src/backend/optimizer/path/uniquekey.c
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -27,6 +27,26 @@ static bool add_uniquekey_for_uniqueindex(PlannerInfo *root,
 										  List *mergeable_const_peer,
 										  List *expr_opfamilies);
 
+static bool is_uniquekey_nulls_removed(PlannerInfo *root,
+									   UniqueKey *ukey,
+									   RelOptInfo *rel);
+static UniqueKey *adjust_uniquekey_multinull_for_joinrel(PlannerInfo *root,
+														 UniqueKey *joinrel_ukey,
+														 RelOptInfo *rel,
+														 bool below_outer_side);
+
+static bool populate_joinrel_uniquekey_for_rel(PlannerInfo *root, RelOptInfo *joinrel,
+											   RelOptInfo *rel, RelOptInfo *other_rel,
+											   List *restrictlist, JoinType jointype);
+static void populate_joinrel_composite_uniquekey(PlannerInfo *root,
+												 RelOptInfo *joinrel,
+												 RelOptInfo *outerrel,
+												 RelOptInfo *innerrel,
+												 List	*restrictlist,
+												 JoinType jointype,
+												 bool outeruk_still_valid,
+												 bool inneruk_still_valid);
+
 /* UniqueKey is subset of .. */
 static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
 								  List *ecs, Relids relids);
@@ -35,6 +55,9 @@ static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
 static bool unique_ecs_useful_for_distinct(PlannerInfo *root, List *ecs);
 static bool unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
 										  List *unique_ecs);
+static bool is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey *ukey,
+										  RelOptInfo *joinrel);
+
 /* Helper functions to create UniqueKey. */
 static UniqueKey *make_uniquekey(Bitmapset *unique_expr_indexes,
 								 bool multi_null,
@@ -90,6 +113,78 @@ populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel)
 	print_uniquekey(root, rel);
 }
 
+/*
+ * populate_joinrel_uniquekeys
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	bool outeruk_still_valid = false, inneruk_still_valid = false;
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		ListCell	*lc;
+		foreach(lc, outerrel->uniquekeys)
+		{
+			/*
+			 * SEMI/ANTI join can be used to remove NULL values as well.
+			 * So we need to adjust multi_nulls for join.
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  adjust_uniquekey_multinull_for_joinrel(root,
+																				 lfirst(lc),
+																				 joinrel,
+																				 false));
+		}
+		return;
+	}
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return;
+
+	switch(jointype)
+	{
+		case JOIN_INNER:
+			outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel,
+																	 innerrel, restrictlist, jointype);
+			inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, innerrel,
+																	 outerrel, restrictlist, jointype);
+			break;
+
+		case JOIN_LEFT:
+			/*
+			 * For left join, we are sure the innerrel's multi_nulls would be true
+			 * and it can't become to multi_nulls=false any more. so just discard it
+			 * and only check the outerrel and composited ones.
+			 */
+			outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel,
+																	 innerrel, restrictlist, jointype);
+			break;
+
+		case JOIN_FULL:
+			/*
+			 * Both sides would contains multi_nulls, don't maintain it
+			 * any more.
+			 */
+			break;
+
+		default:
+			elog(ERROR, "unexpected join_type %d", jointype);
+	}
+
+	populate_joinrel_composite_uniquekey(root, joinrel,
+										 outerrel,
+										 innerrel,
+										 restrictlist,
+										 jointype,
+										 outeruk_still_valid,
+										 inneruk_still_valid);
+
+
+	return;
+}
+
 /*
  * relation_is_distinct_for
  *		Check if the relation is distinct for.
@@ -238,6 +333,253 @@ add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index,
 											 used_for_distinct));
 	return false;
 }
+
+/*
+ * is_uniquekey_nulls_removed
+ *
+ *	note this function will not consider the OUTER JOIN impacts. Caller should
+ * take care of it.
+ *	-- Use my way temporary (RelOptInfo.notnull_attrs) until Tom's is ready.
+ */
+static bool
+is_uniquekey_nulls_removed(PlannerInfo *root,
+						   UniqueKey *ukey,
+						   RelOptInfo *joinrel)
+{
+	int i = -1;
+
+	while((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *node = list_nth(root->unique_exprs, i);
+		List	*ecs;
+		ListCell	*lc;
+		if (IsA(node, SingleRow))
+			continue;
+		ecs = castNode(List, node);
+		foreach(lc, ecs)
+		{
+			EquivalenceClass *ec = lfirst_node(EquivalenceClass, lc);
+			ListCell *emc;
+			foreach(emc, ec->ec_members)
+			{
+				EquivalenceMember *em = lfirst_node(EquivalenceMember, emc);
+				int relid;
+				Var *var;
+				Bitmapset *notnull_attrs;
+				if (!bms_is_subset(em->em_relids, joinrel->relids))
+					continue;
+
+				if (!bms_get_singleton_member(em->em_relids, &relid))
+					continue;
+
+				if (!IsA(em->em_expr, Var))
+					continue;
+
+				var = castNode(Var, em->em_expr);
+
+				if (relid != var->varno)
+					continue;
+
+				notnull_attrs = joinrel->notnull_attrs[var->varno];
+
+				if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+								   notnull_attrs))
+					return false;
+				else
+					break; /* Break to check next ECs */
+			}
+		}
+	}
+	return true;
+}
+
+/*
+ * adjust_uniquekey_multinull_for_joinrel
+ *
+ *	After the join, some NULL values can be removed due to join-clauses.
+ * but the outer join can generated null values again. Return the final
+ * state of the UniqueKey on joinrel.
+ */
+static UniqueKey *
+adjust_uniquekey_multinull_for_joinrel(PlannerInfo *root,
+									   UniqueKey *ukey,
+									   RelOptInfo *joinrel,
+									   bool below_outer_side)
+{
+	if (below_outer_side)
+	{
+		if (ukey->multi_nulls)
+			/* we need it to be multi_nulls, but it is already, just return it. */
+			return ukey;
+		else
+			/* we need it to be multi_nulls, but it is not, create a new one. */
+			return make_uniquekey(ukey->unique_expr_indexes,
+								  true,
+								  ukey->use_for_distinct);
+	}
+	else
+	{
+		/*
+		 * We need to check if the join clauses can remove the NULL values. However
+		 * if it doesn't contain NULL values at the first, we don't need to check it.
+		 */
+		if (!ukey->multi_nulls)
+			return ukey;
+		else
+		{
+			/*
+			 * Multi null values exists. It's time to check if the nulls values
+			 * are removed via outer join.
+			 */
+			if (!is_uniquekey_nulls_removed(root, ukey, joinrel))
+				/* null values can be removed, return the original one. */
+				return ukey;
+			else
+				return make_uniquekey(ukey->unique_expr_indexes,
+									  false, ukey->use_for_distinct);
+		}
+	}
+}
+
+/*
+ * populate_joinrel_uniquekey_for_rel
+ *
+ *    Check if rel.any_column = other_rel.unique_key_columns.
+ * The return value is if the rel->uniquekeys still valid. If
+ * yes, added the uniquekeys in rel to joinrel and return true.
+ * otherwise, return false.
+ */
+static bool
+populate_joinrel_uniquekey_for_rel(PlannerInfo *root, RelOptInfo *joinrel,
+								   RelOptInfo *rel, RelOptInfo *other_rel,
+								   List *restrictlist, JoinType type)
+{
+	bool	rel_keep_unique = false;
+	List *other_ecs = NIL;
+	Relids	other_relids = NULL;
+	ListCell	*lc;
+
+	/*
+	 * Gather all the other ECs regarding to rel, if all the unique ecs contains
+	 * in this list, then it hits our expectations.
+	 */
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *r = lfirst_node(RestrictInfo, lc);
+
+		if (r->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL)
+		{
+			other_ecs = lappend(other_ecs, r->right_ec);
+			other_relids = bms_add_members(other_relids, r->right_relids);
+		}
+		else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != NULL)
+		{
+			other_ecs = lappend(other_ecs, r->right_ec);
+			other_relids = bms_add_members(other_relids, r->left_relids);
+		}
+	}
+
+	foreach(lc, other_rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (uniquekey_contains_in(root, ukey, other_ecs, other_relids))
+		{
+			rel_keep_unique = true;
+			break;
+		}
+	}
+
+	if (!rel_keep_unique)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+
+		if (is_uniquekey_useful_afterjoin(root, ukey, joinrel))
+		{
+			ukey = adjust_uniquekey_multinull_for_joinrel(root,
+														  ukey,
+														  joinrel,
+														  false /* outer_side, caller grantees this */);
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ukey);
+		}
+	}
+
+	return true;
+}
+
+
+/*
+ * Populate_joinrel_composited_uniquekey
+ *
+ *	A composited unqiuekey is valid no matter with join type and restrictlist.
+ */
+static void
+populate_joinrel_composite_uniquekey(PlannerInfo *root,
+									 RelOptInfo *joinrel,
+									 RelOptInfo *outerrel,
+									 RelOptInfo *innerrel,
+									 List	*restrictlist,
+									 JoinType jointype,
+									 bool left_added,
+									 bool right_added)
+{
+	ListCell	*lc;
+	if (left_added || right_added)
+		/* No need to create the composited ones */
+		return;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		UniqueKey	*outer_ukey = adjust_uniquekey_multinull_for_joinrel(root,
+																		 lfirst(lc),
+																		 joinrel,
+																		 jointype == JOIN_FULL);
+		ListCell	*lc2;
+
+		if (!is_uniquekey_useful_afterjoin(root, outer_ukey, joinrel))
+			continue;
+
+		foreach(lc2, innerrel->uniquekeys)
+		{
+			UniqueKey	*inner_ukey = adjust_uniquekey_multinull_for_joinrel(root,
+																			 lfirst(lc2),
+																			 joinrel,
+																			 (jointype == JOIN_FULL || jointype == JOIN_LEFT)
+				);
+
+			UniqueKey	*comp_ukey;
+
+			if (!is_uniquekey_useful_afterjoin(root, inner_ukey, joinrel))
+				continue;
+
+			comp_ukey = make_uniquekey(
+				/* unique_expr_indexes is easy, just union the both sides. */
+				bms_union(outer_ukey->unique_expr_indexes, inner_ukey->unique_expr_indexes),
+				/*
+				 * If both are !multi_nulls, then the composited one is !multi_null
+				 * no matter with jointype and join clauses. otherwise, it is multi
+				 * nulls no matter with other factors.
+				 *
+				 */
+				outer_ukey->multi_nulls || inner_ukey->multi_nulls,
+				/*
+				 * we need both sides are used in distinct to say the composited
+				 * one is used for distinct as well.
+				 */
+				outer_ukey->use_for_distinct && inner_ukey->use_for_distinct);
+
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, comp_ukey);
+		}
+	}
+}
+
+
 /*
  * uniquekey_contains_in
  *	Return if UniqueKey contains in the list of EquivalenceClass
@@ -333,6 +675,43 @@ unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *unique_e
 
 	return true;
 }
+
+/*
+ * is_uniquekey_useful_afterjoin
+ *
+ *  is useful when it contains in distinct_pathkey or in mergable join clauses.
+ */
+static bool
+is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey *ukey,
+							 RelOptInfo *joinrel)
+{
+	int	i = -1;
+
+	if (ukey->use_for_distinct)
+		return true;
+
+	while((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *exprs =  list_nth(root->unique_exprs, i);
+		if (IsA(exprs, List))
+		{
+			if (!unique_ecs_useful_for_merging(root, joinrel, (List *)exprs))
+				return false;
+		}
+		else
+		{
+			Assert(IsA(exprs, SingleRow));
+			/*
+			 * Ideally we should check if there are a expr on SingleRow
+			 * used in joinrel's joinclauses, but it can't be checked effectively
+			 * for now, so we just check the rest part. so just think
+			 * it is useful.
+			 */
+		}
+	}
+	return true;
+}
+
 /*
  *	make_uniquekey
  */
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index b75e1679e6..6695f9f2a6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -846,6 +846,8 @@ build_join_rel(PlannerInfo *root,
 	}
 
 	set_joinrel_notnull_attrs(joinrel, outer_rel, inner_rel, restrictlist, sjinfo);
+	populate_joinrel_uniquekeys(root, joinrel, outer_rel, inner_rel,
+								restrictlist, sjinfo->jointype);
 
 	return joinrel;
 }
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 68b8b40ca9..f233837e59 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -265,6 +265,11 @@ extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 extern void populate_baserel_uniquekeys(PlannerInfo *root,
 										RelOptInfo *baserel);
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+										RelOptInfo *outerrel, RelOptInfo *innerrel,
+										List *restrictlist, JoinType jointype);
 extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 									 List *distinct_pathkey);
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/uniquekey.out b/src/test/regress/expected/uniquekey.out
index d9a8634e80..c2bd1fa619 100644
--- a/src/test/regress/expected/uniquekey.out
+++ b/src/test/regress/expected/uniquekey.out
@@ -80,3 +80,330 @@ EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
  Seq Scan on uqk1
 (1 row)
 
+------------------------------------------------------
+-- Test UniqueKey on one side still valid after join.
+-----------------------------------------------------
+-- uqk1(c, d) is the uniquekey with mutli nulls at single relation access.
+-- so distinct is not no-op.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Hash Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+-- Both uqk1 (c,d) are a valid uniquekey. 
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- uqk1.c is null at baserel, but the null values are removed after join.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE  uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Hash Join
+   Hash Cond: ((uqk1.a = uqk2.pk) AND (uqk1.c = uqk2.c))
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+-- uqk1.c is null at baserel, but the null values are removed after join
+-- but new null values are generated due to outer join again. so distinct
+-- is still needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 right join uqk2
+on uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Hash Right Join
+               Hash Cond: ((uqk1.a = uqk2.pk) AND (uqk1.c = uqk2.c))
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+------------------------------------------------------
+-- Test join: Composited UniqueKey
+-----------------------------------------------------
+-- both t1.pk and t1.pk is valid uniquekey.
+EXPLAIN SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.27 rows=15 width=8)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=5 width=4)
+   ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+(4 rows)
+
+SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2 order by 1, 2;
+ pk | pk 
+----+----
+  1 |  1
+  1 |  4
+  1 |  5
+  2 |  1
+  2 |  4
+  2 |  5
+  3 |  1
+  3 |  4
+  3 |  5
+  4 |  1
+  4 |  4
+  4 |  5
+  5 |  1
+  5 |  4
+  5 |  5
+(15 rows)
+
+-- NOT OK, since t1.c includes multi nulls. 
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ HashAggregate  (cost=2.20..2.26 rows=6 width=12)
+   Group Key: t1.c, t1.d, t2.pk
+   ->  Nested Loop  (cost=0.00..2.16 rows=6 width=12)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+         ->  Materialize  (cost=0.00..1.06 rows=2 width=8)
+               ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=2 width=8)
+                     Filter: (c IS NULL)
+(7 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+ c | d | pk 
+---+---+----
+   | 4 |  1
+   | 4 |  4
+   | 4 |  5
+(3 rows)
+
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+ c | d | pk 
+---+---+----
+   | 4 |  1
+   | 4 |  1
+   | 4 |  4
+   | 4 |  4
+   | 4 |  5
+   | 4 |  5
+(6 rows)
+
+-- let's remove the t1.c's multi null values
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.20 rows=9 width=12)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=3 width=8)
+         Filter: (c IS NOT NULL)
+   ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+(5 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3 ;
+ c | d | pk 
+---+---+----
+ 1 | 1 |  1
+ 1 | 1 |  4
+ 1 | 1 |  5
+ 2 | 2 |  1
+ 2 | 2 |  4
+ 2 | 2 |  5
+ 3 | 3 |  1
+ 3 | 3 |  4
+ 3 | 3 |  5
+(9 rows)
+
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3;
+ c | d | pk 
+---+---+----
+ 1 | 1 |  1
+ 1 | 1 |  4
+ 1 | 1 |  5
+ 2 | 2 |  1
+ 2 | 2 |  4
+ 2 | 2 |  5
+ 3 | 3 |  1
+ 3 | 3 |  4
+ 3 | 3 |  5
+(9 rows)
+
+-- test onerow case with composited cases.
+-- t2.c is onerow. OK
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.12 rows=3 width=12)
+   ->  Seq Scan on uqk2 t2  (cost=0.00..1.04 rows=1 width=4)
+         Filter: (pk = 1)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=3 width=8)
+         Filter: (c IS NOT NULL)
+(5 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+ c | d | c 
+---+---+---
+ 1 | 1 | 1
+ 2 | 2 | 1
+ 3 | 3 | 1
+(3 rows)
+
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+ c | d | c 
+---+---+---
+ 1 | 1 | 1
+ 2 | 2 | 1
+ 3 | 3 | 1
+(3 rows)
+
+-- t2.c is onerow, but t1.c has multi-nulls, NOt OK.
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Unique  (cost=2.12..2.14 rows=2 width=12)
+   ->  Sort  (cost=2.12..2.12 rows=2 width=12)
+         Sort Key: t1.c, t1.d, t2.c
+         ->  Nested Loop  (cost=0.00..2.11 rows=2 width=12)
+               ->  Seq Scan on uqk2 t2  (cost=0.00..1.04 rows=1 width=4)
+                     Filter: (pk = 1)
+               ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=2 width=8)
+                     Filter: (c IS NULL)
+(8 rows)
+
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+ c | d | c 
+---+---+---
+   | 4 | 1
+(1 row)
+
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+ c | d | c 
+---+---+---
+   | 4 | 1
+   | 4 | 1
+(2 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-----------------------------------
+-- Test Join: Special OneRow case.
+-----------------------------------
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.c = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 1)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk order BY 1;
+ d 
+---
+ 1
+(1 row)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 1)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+ 1
+(1 row)
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-----------------------------------------
+-- Test more non-unique cases after join.
+-----------------------------------------
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1, uqk2 WHERE uqk1.c = uqk2.c;
+                 QUERY PLAN                 
+--------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Join
+               Hash Cond: (uqk1.c = uqk2.c)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.c;
+                  QUERY PLAN                  
+----------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.d
+         ->  Nested Loop
+               Join Filter: (uqk1.c = uqk2.c)
+               ->  Seq Scan on uqk1
+                     Filter: (pk = 1)
+               ->  Seq Scan on uqk2
+(8 rows)
+
diff --git a/src/test/regress/sql/uniquekey.sql b/src/test/regress/sql/uniquekey.sql
index a1b538d1c1..3f93872246 100644
--- a/src/test/regress/sql/uniquekey.sql
+++ b/src/test/regress/sql/uniquekey.sql
@@ -27,3 +27,82 @@ EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE c = 1 and d = 1;
 
 -- Test Distinct ON
 EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
+
+------------------------------------------------------
+-- Test UniqueKey on one side still valid after join.
+-----------------------------------------------------
+-- uqk1(c, d) is the uniquekey with mutli nulls at single relation access.
+-- so distinct is not no-op.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk;
+
+-- Both uqk1 (c,d) are a valid uniquekey. 
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- uqk1.c is null at baserel, but the null values are removed after join.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE  uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+
+-- uqk1.c is null at baserel, but the null values are removed after join
+-- but new null values are generated due to outer join again. so distinct
+-- is still needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 right join uqk2
+on uqk1.a = uqk2.pk and uqk1.c = uqk2.c;
+
+
+------------------------------------------------------
+-- Test join: Composited UniqueKey
+-----------------------------------------------------
+-- both t1.pk and t1.pk is valid uniquekey.
+EXPLAIN SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2;
+SELECT DISTINCT t1.pk, t2.pk FROM uqk1 t1 cross join uqk2 t2 order by 1, 2;
+
+-- NOT OK, since t1.c includes multi nulls. 
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null;
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is null order by 1, 2,3;
+
+-- let's remove the t1.c's multi null values
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null;
+SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3 ;
+SELECT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null order by 1, 2, 3;
+
+-- test onerow case with composited cases.
+
+-- t2.c is onerow. OK
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null and t2.pk = 1;
+
+-- t2.c is onerow, but t1.c has multi-nulls, NOt OK.
+EXPLAIN SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+SELECT DISTINCT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+SELECT t1.c, t1.d, t2.c FROM uqk1 t1 cross join uqk2 t2 where t1.c is null and t2.pk = 1;
+
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-----------------------------------
+-- Test Join: Special OneRow case.
+-----------------------------------
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk;
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.pk order BY 1;
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d;
+SELECT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-----------------------------------------
+-- Test more non-unique cases after join.
+-----------------------------------------
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1, uqk2 WHERE uqk1.c = uqk2.c;
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.c;
-- 
2.21.0

v4-0002-Just-some-utils-functions.patchapplication/octet-stream; name=v4-0002-Just-some-utils-functions.patchDownload
From f2d62ff4ceca2a33e557e5fb8bc4a2f6f3de872a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 25 Jul 2021 16:26:45 +0800
Subject: [PATCH v4 2/6] Just some utils functions.

---
 src/backend/nodes/list.c                | 21 ++++++++++++
 src/backend/optimizer/path/equivclass.c | 45 +++++++++++++++++++++++++
 src/backend/optimizer/util/tlist.c      | 19 +++++++++++
 src/include/nodes/pg_list.h             |  2 ++
 src/include/optimizer/optimizer.h       |  1 +
 src/include/optimizer/paths.h           |  6 ++++
 6 files changed, 94 insertions(+)

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 94fb236daf..8f2b00012e 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -702,6 +702,27 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+
+/*
+ * list_is_subset_ptr - is A a subset of B?
+ */
+bool
+list_is_subset_ptr(const List *a, const List *b)
+{
+	ListCell *lc;
+	Assert(IsPointerList(a));
+	check_list_invariants(a);
+	Assert(IsPointerList(b));
+	check_list_invariants(b);
+	foreach(lc, a)
+	{
+		if (!list_member_ptr(b, lfirst(lc)))
+			return false;
+	}
+	return true;
+}
+
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 2f688807af..b8ada6e8a3 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -770,6 +770,26 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 	return newec;
 }
 
+/*
+ * find_ec_member_matching_expr
+ *		Locate an EquivalenceClass matching the given expr, if any;
+ *		return NULL if no match.
+ */
+EquivalenceClass *
+find_ec_matching_expr(PlannerInfo *root,
+					  Expr *expr,
+					  RelOptInfo *baserel)
+{
+	int i = -1;
+	while ((i = bms_next_member(baserel->eclass_indexes, i)) >= 0)
+	{
+		EquivalenceClass *ec = list_nth(root->eq_classes, i);
+		if (find_ec_member_matching_expr(ec, expr, baserel->relids))
+			return ec;
+	}
+	return NULL;
+}
+
 /*
  * find_ec_member_matching_expr
  *		Locate an EquivalenceClass member matching the given expr, if any;
@@ -960,6 +980,31 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	return NULL;
 }
 
+
+/*
+ * build_equivalanceclass_list_for_exprs
+ *
+ * 	Given a list of expr, find the related ECs for everyone of them.
+ * if any exprs has no EC related, return NIL.
+ */
+List *
+build_equivalanceclass_list_for_exprs(PlannerInfo *root,
+									  List *exprs,
+									  RelOptInfo *rel)
+{
+	ListCell	*lc;
+	List	*ecs = NIL;
+
+	foreach(lc, exprs)
+	{
+		EquivalenceClass *ec = find_ec_matching_expr(root, lfirst(lc), rel);
+		if (!ec)
+			return NIL;
+		ecs = lappend(ecs, ec);
+	}
+	return ecs;
+}
+
 /*
  * ec_useful_for_merging
  *	check if the ec exists in rel's merageable restrictinfo_lists.
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 311579d059..5d8493c8d6 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -359,6 +359,25 @@ get_sortgroupclause_tle(SortGroupClause *sgClause,
 	return get_sortgroupref_tle(sgClause->tleSortGroupRef, targetList);
 }
 
+/*
+ * get_tle_from_expr
+ *
+ *	Find the targetlist entry matching the given expr and return it.
+ */
+TargetEntry *
+get_tle_from_expr(Expr *expr, List *targetlist)
+{
+	ListCell	*lc;
+
+	foreach(lc, targetlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+		if (equal(tle->expr, expr))
+			return tle;
+	}
+	return NULL;
+}
 /*
  * get_sortgroupclause_expr
  *		Find the targetlist entry matching the given SortGroupClause
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 30f98c4595..bbe0209d7e 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -558,6 +558,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset_ptr(const List *a, const List *b);
+
 extern pg_nodiscard List *list_delete(List *list, void *datum);
 extern pg_nodiscard List *list_delete_ptr(List *list, void *datum);
 extern pg_nodiscard List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 41b49b2662..5911014a32 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -171,6 +171,7 @@ extern TargetEntry *get_sortgroupref_tle(Index sortref,
 										 List *targetList);
 extern TargetEntry *get_sortgroupclause_tle(SortGroupClause *sgClause,
 											List *targetList);
+extern TargetEntry *get_tle_from_expr(Expr *expr, List *targetlist);
 extern Node *get_sortgroupclause_expr(SortGroupClause *sgClause,
 									  List *targetList);
 extern List *get_sortgrouplist_exprs(List *sgClauses,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index a29f616423..e813d82483 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,9 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
 												  Index sortref,
 												  Relids rel,
 												  bool create_it);
+extern EquivalenceClass *find_ec_matching_expr(PlannerInfo *root,
+											   Expr *expr,
+											   RelOptInfo *rel);
 extern EquivalenceMember *find_ec_member_matching_expr(EquivalenceClass *ec,
 													   Expr *expr,
 													   Relids relids);
@@ -146,6 +149,9 @@ extern EquivalenceMember *find_computable_ec_member(PlannerInfo *root,
 extern bool ec_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
 								  EquivalenceClass *ec);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern List *build_equivalanceclass_list_for_exprs(PlannerInfo *root,
+												   List *exprs,
+												   RelOptInfo *rel);
 extern bool relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
 										 EquivalenceClass *ec,
 										 bool require_parallel_safe);
-- 
2.21.0

v4-0004-Support-UniqueKey-on-BaseRel.patchapplication/octet-stream; name=v4-0004-Support-UniqueKey-on-BaseRel.patchDownload
From ac411f30a1a30f58037234fffc7e88e6f2eb636c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 9 Aug 2021 19:10:35 +0800
Subject: [PATCH v4 4/6] Support UniqueKey on BaseRel.

---
 src/backend/optimizer/path/Makefile     |   3 +-
 src/backend/optimizer/path/allpaths.c   |   7 +-
 src/backend/optimizer/path/uniquekey.c  | 400 ++++++++++++++++++++++++
 src/backend/optimizer/plan/planner.c    |  10 +
 src/backend/optimizer/util/relnode.c    |   2 +
 src/include/nodes/nodes.h               |   3 +-
 src/include/nodes/pathnodes.h           |  20 ++
 src/include/optimizer/paths.h           |   4 +
 src/test/regress/expected/join.out      |  11 +-
 src/test/regress/expected/uniquekey.out |  82 +++++
 src/test/regress/parallel_schedule      |   2 +-
 src/test/regress/sql/uniquekey.sql      |  29 ++
 12 files changed, 562 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c
 create mode 100644 src/test/regress/expected/uniquekey.out
 create mode 100644 src/test/regress/sql/uniquekey.sql

diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a63a3dfe00..343253d694 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -396,6 +396,9 @@ static void
 set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			 Index rti, RangeTblEntry *rte)
 {
+	/* Set the notnull before the UniqueKey population. */
+	set_baserel_notnull_attrs(rel);
+
 	if (rel->reloptkind == RELOPT_BASEREL &&
 		relation_excluded_by_constraints(root, rel, rte))
 	{
@@ -491,7 +494,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
-	set_baserel_notnull_attrs(rel);
+
 
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
@@ -616,6 +619,8 @@ set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	 */
 	check_index_predicates(root, rel);
 
+	populate_baserel_uniquekeys(root, rel);
+
 	/* Mark rel with estimated output rows, width, etc */
 	set_baserel_size_estimates(root, rel);
 }
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..c93075656d
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,400 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekey.c
+ *	  Utilities for maintaining uniquekey.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekey.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/sysattr.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/pathnodes.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/paths.h"
+
+
+/* Functions to populate UniqueKey */
+static bool add_uniquekey_for_uniqueindex(PlannerInfo *root,
+										  IndexOptInfo *unique_index,
+										  List *mergeable_const_peer,
+										  List *expr_opfamilies);
+
+/* UniqueKey is subset of .. */
+static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
+								  List *ecs, Relids relids);
+
+/* Avoid useless UniqueKey. */
+static bool unique_ecs_useful_for_distinct(PlannerInfo *root, List *ecs);
+static bool unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
+										  List *unique_ecs);
+/* Helper functions to create UniqueKey. */
+static UniqueKey *make_uniquekey(Bitmapset *unique_expr_indexes,
+								 bool multi_null,
+								 bool useful_for_distinct);
+static void mark_rel_singlerow(PlannerInfo *root, RelOptInfo *rel);
+
+/* Debug only */
+static void print_uniquekey(PlannerInfo *root, RelOptInfo *rel);
+
+/*
+ * populate_baserel_uniquekeys
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel)
+{
+	ListCell	*lc;
+	List	*mergeable_const_peer = NIL, *expr_opfamilies = NIL;
+
+	/*
+	 * ColX = {Const} AND ColY = {Const2} AND ColZ > {Const3},
+	 * gather ColX and ColY into mergeable_const_peer.
+	 */
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids))
+			mergeable_const_peer = lappend(mergeable_const_peer, get_rightop(rinfo->clause));
+		else if (bms_is_empty(rinfo->right_relids))
+			mergeable_const_peer = lappend(mergeable_const_peer, get_leftop(rinfo->clause));
+		else
+			continue;
+		expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies);
+	}
+
+	foreach(lc, rel->indexlist)
+	{
+		IndexOptInfo *index = (IndexOptInfo *)lfirst(lc);
+		if (!index->unique || !index->immediate ||
+			(index->indpred != NIL && !index->predOK))
+			continue;
+
+		if (add_uniquekey_for_uniqueindex(root, index,
+										  mergeable_const_peer,
+										  expr_opfamilies))
+			/* Find a singlerow case, no need to go through any more. */
+			return;
+	}
+
+	print_uniquekey(root, rel);
+}
+
+/*
+ * relation_is_distinct_for
+ *		Check if the relation is distinct for.
+ */
+bool
+relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *distinct_pathkey)
+{
+	ListCell	*lc;
+	List	*lecs = NIL;
+	Relids	relids = NULL;
+	foreach(lc, distinct_pathkey)
+	{
+		PathKey *pathkey = lfirst(lc);
+		lecs = lappend(lecs, pathkey->pk_eclass);
+		/*
+		 * Note that ec_relids doesn't include child member, but
+		 * distinct would not operate on childrel as well.
+		 */
+		relids = bms_union(relids, pathkey->pk_eclass->ec_relids);
+	}
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst(lc);
+		if (ukey->multi_nulls)
+			continue;
+
+		if (uniquekey_contains_in(root, ukey, lecs, relids))
+			return true;
+	}
+	return false;
+}
+
+/*
+ * add_uniquekey_for_uniqueindex
+ *	 populate a UniqueKey if necessary, return true iff the UniqueKey is an
+ * SingleRow.
+ */
+static bool
+add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index,
+							  List *mergeable_const_peer, List *expr_opfamilies)
+{
+	List	*unique_exprs = NIL, *unique_ecs = NIL;
+	ListCell	*indexpr_item;
+	int	c = 0;
+	RelOptInfo *rel = unique_index->rel;
+	bool	multinull = false;
+	bool	used_for_distinct = false;
+	Bitmapset *unique_exprs_index;
+
+	indexpr_item = list_head(unique_index->indexprs);
+	/* Gather all the non-const exprs */
+	for (c = 0; c < unique_index->nkeycolumns; c++)
+	{
+		int attr = unique_index->indexkeys[c];
+		Expr *expr;
+		bool	matched_const = false;
+		ListCell	*lc1, *lc2;
+		if (attr > 0)
+		{
+			Var *var;
+			expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr;
+			var = castNode(Var, expr);
+			Assert(IsA(expr, Var));
+			if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+							  rel->notnull_attrs[0]))
+				multinull = true;
+		}
+		else if (attr == 0)
+		{
+			/* Expression index */
+			expr = lfirst(indexpr_item);
+			indexpr_item = lnext(unique_index->indexprs, indexpr_item);
+			/* We can't grantee an FuncExpr will not return NULLs */
+			multinull = true;
+		}
+		else /* attr < 0 */
+		{
+			/* Index on OID is possible, not handle it for now. */
+			return false;
+		}
+
+		/*
+		 * Check index_col = Const case with regarding to opfamily checking
+		 * If so, we can remove the index_col from the final UniqueKey->exprs.
+		 */
+		forboth(lc1, mergeable_const_peer, lc2, expr_opfamilies)
+		{
+			if (list_member_oid((List *) lfirst(lc2), unique_index->opfamily[c]) &&
+				match_index_to_operand((Node *) lfirst(lc1), c, unique_index))
+			{
+				matched_const = true;
+				break;
+			}
+		}
+
+		if (matched_const)
+			continue;
+
+		unique_exprs = lappend(unique_exprs, expr);
+	}
+
+	if (unique_exprs == NIL)
+	{
+		/*
+		 * SingleRow case. Checking if it is useful is ineffective
+		 * so just keep it.
+		 */
+		mark_rel_singlerow(root, rel);
+		return true;
+	}
+
+	unique_ecs = build_equivalanceclass_list_for_exprs(root, unique_exprs, rel);
+
+	if (unique_ecs == NIL)
+	{
+		/* It is neither used in distinct_pathkey nor mergeable clause */
+		return false;
+	}
+
+	/*
+	 * Check if we need to setup the UniqueKey and set the used_for_distinct accordingly.
+	 */
+	if (unique_ecs_useful_for_distinct(root, unique_ecs))
+	{
+		used_for_distinct = true;
+	}
+	else if (!unique_ecs_useful_for_merging(root, rel, unique_ecs))
+		/*
+		 * Neither used in distinct pathkey nor used in mergeable clause.
+		 * this is possible even if unique_ecs != NIL.
+		 */
+		return false;
+	else
+	{
+		/*
+		 * unique_ecs_useful_for_merging(root, rel, unique_ecs) is true,
+		 * we did nothing in this case.
+		 */
+	}
+	unique_exprs_index = bms_make_singleton(list_length(root->unique_exprs));
+	root->unique_exprs = lappend(root->unique_exprs, unique_ecs);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  make_uniquekey(unique_exprs_index,
+											 multinull,
+											 used_for_distinct));
+	return false;
+}
+/*
+ * uniquekey_contains_in
+ *	Return if UniqueKey contains in the list of EquivalenceClass
+ * or the UniqueKey's SingleRow contains in relids.
+ *
+ */
+static bool
+uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey, List *ecs, Relids relids)
+{
+	int i = -1;
+	while ((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *exprs = list_nth(root->unique_exprs, i);
+		if (IsA(exprs, SingleRow))
+		{
+			SingleRow *singlerow = castNode(SingleRow, exprs);
+			if (!bms_is_member(singlerow->relid, relids))
+				/*
+				 * UniqueKey request a ANY expr on relid on the relid(which
+				 * indicates we don't have other EquivalenceClass for this
+				 * relation), but the relid doesn't contains in relids, which
+				 * indicate there is no such Expr in target, then we are sure
+				 * to return false.
+				 */
+				return false;
+			else
+			{
+				/*
+				 * We have SingleRow on relid, and the relid is in relids.
+				 * We don't need to check any more for this expr. This is
+				 * right for sure.
+				 */
+			}
+		}
+		else
+		{
+			Assert(IsA(exprs, List));
+			if (!list_is_subset_ptr((List *)exprs, ecs))
+				return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * unique_ecs_useful_for_distinct
+ *	return true if all the EquivalenceClass in ecs exists in root->distinct_pathkey.
+ */
+static bool
+unique_ecs_useful_for_distinct(PlannerInfo *root, List *ecs)
+{
+	ListCell *lc;
+	foreach(lc, ecs)
+	{
+		EquivalenceClass *ec = lfirst_node(EquivalenceClass, lc);
+		ListCell *p;
+		bool found = false;
+		foreach(p,  root->distinct_pathkeys)
+		{
+			PathKey *pathkey = lfirst_node(PathKey, p);
+			/*
+			 * Both of them should point to an element in root->eq_classes.
+			 * so the address should be same. and equal function doesn't
+			 * support EquivalenceClass yet.
+			 */
+			if (ec == pathkey->pk_eclass)
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * unique_ecs_useful_for_merging
+ *	return true if all the unique_ecs exists in rel's join restrictInfo.
+ */
+static bool
+unique_ecs_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *unique_ecs)
+{
+	ListCell	*lc;
+
+	foreach(lc, unique_ecs)
+	{
+		EquivalenceClass *ec = lfirst(lc);
+		if (!ec_useful_for_merging(root, rel, ec))
+			return false;
+	}
+
+	return true;
+}
+/*
+ *	make_uniquekey
+ */
+static UniqueKey *
+make_uniquekey(Bitmapset *unique_expr_indexes, bool multi_null, bool useful_for_distinct)
+{
+	UniqueKey *ukey = makeNode(UniqueKey);
+	ukey->unique_expr_indexes = unique_expr_indexes;
+	ukey->multi_nulls = multi_null;
+	ukey->use_for_distinct = useful_for_distinct;
+	return ukey;
+}
+
+/*
+ * mark_rel_singlerow
+ *	mark a relation as singlerow.
+ */
+static void
+mark_rel_singlerow(PlannerInfo *root, RelOptInfo *rel)
+{
+	int exprs_pos = list_length(root->unique_exprs);
+	Bitmapset *unique_exprs_index = bms_make_singleton(exprs_pos);
+	SingleRow *singlerow = makeNode(SingleRow);
+	singlerow->relid = rel->relid;
+	root->unique_exprs = lappend(root->unique_exprs, singlerow);
+	rel->uniquekeys = list_make1(make_uniquekey(unique_exprs_index,
+												false /* multi-null */,
+												true /* arbitrary decision */));
+}
+
+/*
+ * print_uniquekey
+ *	Used for easier reivew, should be removed before commit.
+ */
+static void
+print_uniquekey(PlannerInfo *root, RelOptInfo *rel)
+{
+	if (false)
+	{
+		ListCell	*lc;
+		elog(INFO, "Rel = %s", bmsToString(rel->relids));
+		foreach(lc, rel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			int i = -1;
+			elog(INFO, "UNIQUEKEY{indexes=%s, multinull=%d}",
+				 bmsToString(ukey->unique_expr_indexes),
+				 ukey->multi_nulls
+				);
+
+			while ((i = bms_next_member(ukey->unique_expr_indexes, i)) >= 0)
+			{
+				Node *node = (Node *) list_nth(root->unique_exprs, i);
+				if (IsA(node, SingleRow))
+					elog(INFO,
+						 "Expr(%d) SingleRow{relid = %d}",
+						 i, castNode(SingleRow, node)->relid);
+				else
+					elog(INFO,
+						 "EC(%d), %s", i, nodeToString(node)
+						);
+			}
+		}
+	}
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2cd691191c..ed2cff00fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4238,6 +4238,16 @@ create_distinct_paths(PlannerInfo *root,
 	Path	   *path;
 	ListCell   *lc;
 
+	/*
+	 * distinct_pathkeys may be NIL if it distinctClause is sortable.
+	 * see standard_qp_callback. But for the efficiency of relation_is_distinct_for
+	 * we can't use distinctClause (rather than EC) here. Fortunately not sortable
+	 * clause is rare in real case.
+	 */
+	if (root->distinct_pathkeys &&
+		relation_is_distinct_for(root, input_rel, root->distinct_pathkeys))
+		return input_rel;
+
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f017a7d52d..b75e1679e6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -261,6 +261,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->partexprs = NULL;
 	rel->nullable_partexprs = NULL;
 	rel->notnull_attrs = palloc0(sizeof(Bitmapset *) * 1);
+	rel->uniquekeys = NIL;
 
 	/*
 	 * Pass assorted information down the inheritance hierarchy.
@@ -752,6 +753,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->partexprs = NULL;
 	joinrel->nullable_partexprs = NULL;
 	joinrel->notnull_attrs = palloc0(sizeof(Bitmapset *) * (bms_max_member(joinrel->relids) + 1));
+	joinrel->uniquekeys = NIL;
 
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..1bf220f373 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -280,7 +280,8 @@ typedef enum NodeTag
 	T_RollupData,
 	T_GroupingSetData,
 	T_StatisticExtInfo,
-
+	T_UniqueKey,
+	T_SingleRow,
 	/*
 	 * TAGS FOR MEMORY NODES (memnodes.h)
 	 */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 7bf1896e12..3581b03a75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -246,6 +246,7 @@ struct PlannerInfo
 									 * subquery outputs */
 
 	List	   *eq_classes;		/* list of active EquivalenceClasses */
+	List	   *unique_exprs;		/* List of unique expr */
 
 	bool		ec_merging_done;	/* set true once ECs are canonical */
 
@@ -691,6 +692,7 @@ typedef struct RelOptInfo
 								  * the len would always 1. and for others the array
 								  * index is relid from relids.
 								  */
+	List		*uniquekeys; /* A list of UniqueKey. */
 
 	/* materialization information */
 	List	   *pathlist;		/* Path structures */
@@ -1070,6 +1072,24 @@ typedef struct PathKey
 	bool		pk_nulls_first; /* do NULLs come before normal values? */
 } PathKey;
 
+
+typedef struct UnqiueKey
+{
+	NodeTag	type;
+	Bitmapset	*unique_expr_indexes;
+	bool	multi_nulls;
+	bool	use_for_distinct;  /* true if it is used in distinct-pathkey, in this case
+								* we would never check if we should discard it during
+								* join search.
+								*/
+} UniqueKey;
+
+typedef struct SingleRow
+{
+	NodeTag	type;
+	Index		relid;
+} SingleRow;
+
 /*
  * VolatileFunctionStatus -- allows nodes to cache their
  * contain_volatile_functions properties. VOLATILITY_UNKNOWN means not yet
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index e813d82483..68b8b40ca9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -263,4 +263,8 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel);
+extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
+									 List *distinct_pathkey);
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f3589d0dbb..db72374302 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4612,18 +4612,15 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+             QUERY PLAN             
+------------------------------------
  Merge Right Join
    Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   ->  Index Scan using b_pkey on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(6 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/uniquekey.out b/src/test/regress/expected/uniquekey.out
new file mode 100644
index 0000000000..d9a8634e80
--- /dev/null
+++ b/src/test/regress/expected/uniquekey.out
@@ -0,0 +1,82 @@
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3), (4, 4, null, 4), (5, 5, null, 4);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table primary key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+-- Test EC case.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE d = pk;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (d = pk)
+(2 rows)
+
+-- Test UniqueKey indexes.
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+-- Test not null quals and not null per catalog.
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+       QUERY PLAN       
+------------------------
+ HashAggregate
+   Group Key: c, d
+   ->  Seq Scan on uqk1
+(3 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+          QUERY PLAN           
+-------------------------------
+ Unique
+   ->  Sort
+         Sort Key: d
+         ->  Seq Scan on uqk1
+               Filter: (c = 1)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+-- Test UniqueKey column reduction.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+    QUERY PLAN     
+-------------------
+ Seq Scan on uqk1
+   Filter: (c = 1)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE c = 1 and d = 1;
+           QUERY PLAN            
+---------------------------------
+ Seq Scan on uqk1
+   Filter: ((c = 1) AND (d = 1))
+(2 rows)
+
+-- Test Distinct ON
+EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 7be89178f0..41454448f1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -72,7 +72,7 @@ test: sanity_check
 # ----------
 ignore: random
 test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update delete namespace prepared_xacts
-
+test: uniquekey
 # ----------
 # Another group of parallel tests
 # ----------
diff --git a/src/test/regress/sql/uniquekey.sql b/src/test/regress/sql/uniquekey.sql
new file mode 100644
index 0000000000..a1b538d1c1
--- /dev/null
+++ b/src/test/regress/sql/uniquekey.sql
@@ -0,0 +1,29 @@
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3), (4, 4, null, 4), (5, 5, null, 4);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table primary key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+
+-- Test EC case.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE d = pk;
+
+-- Test UniqueKey indexes.
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+-- Test not null quals and not null per catalog.
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+
+-- Test UniqueKey column reduction.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE c = 1 and d = 1;
+
+-- Test Distinct ON
+EXPLAIN (COSTS OFF) SELECT DISTINCT ON(pk) d FROM uqk1;
-- 
2.21.0

v4-0001-Just-refactor-pathkeys_useful_for_merging-split-a.patchapplication/octet-stream; name=v4-0001-Just-refactor-pathkeys_useful_for_merging-split-a.patchDownload
From 32886c5b14709d1f7ad27e5e024c3e083b97026c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 25 Jul 2021 16:28:30 +0800
Subject: [PATCH v4 1/6] Just refactor pathkeys_useful_for_merging, split a new
 function

ec_useful_for_merging which is useful for UniqueKey as well.
---
 src/backend/optimizer/path/equivclass.c | 42 +++++++++++++++++++++++++
 src/backend/optimizer/path/pathkeys.c   | 33 +------------------
 src/include/optimizer/paths.h           |  2 ++
 3 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 6f1abbe47d..2f688807af 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -960,6 +960,48 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	return NULL;
 }
 
+/*
+ * ec_useful_for_merging
+ *	check if the ec exists in rel's merageable restrictinfo_lists.
+ */
+bool
+ec_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
+					  EquivalenceClass *ec)
+{
+	/*
+	 * First look into the EquivalenceClass to see if there are any members
+	 * not yet joined to the rel.
+	 */
+	if (rel->has_eclass_joins &&
+		eclass_useful_for_merging(root, ec, rel))
+		return true;
+	else
+	{
+		/*
+		 * Otherwise search the rel's joininfo list, which contains
+		 * non-EquivalenceClass-derivable join clauses that might
+		 * nonetheless be mergejoinable.
+		 */
+		ListCell	*j;
+		foreach(j, rel->joininfo)
+		{
+			RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(j);
+
+			if (restrictinfo->mergeopfamilies == NIL)
+				continue;
+			update_mergeclause_eclasses(root, restrictinfo);
+
+			if (ec == restrictinfo->left_ec ||
+			    ec == restrictinfo->right_ec)
+			{
+				return true;
+			}
+		}
+	}
+
+	return false;
+}
+
 /*
  * relation_can_be_sorted_early
  *		Can this relation be sorted on this EC before the final output step?
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 216dd26385..21f4f2d4bc 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1751,43 +1751,12 @@ pathkeys_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
 	{
 		PathKey    *pathkey = (PathKey *) lfirst(i);
 		bool		matched = false;
-		ListCell   *j;
 
 		/* If "wrong" direction, not useful for merging */
 		if (!right_merge_direction(root, pathkey))
 			break;
 
-		/*
-		 * First look into the EquivalenceClass of the pathkey, to see if
-		 * there are any members not yet joined to the rel.  If so, it's
-		 * surely possible to generate a mergejoin clause using them.
-		 */
-		if (rel->has_eclass_joins &&
-			eclass_useful_for_merging(root, pathkey->pk_eclass, rel))
-			matched = true;
-		else
-		{
-			/*
-			 * Otherwise search the rel's joininfo list, which contains
-			 * non-EquivalenceClass-derivable join clauses that might
-			 * nonetheless be mergejoinable.
-			 */
-			foreach(j, rel->joininfo)
-			{
-				RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(j);
-
-				if (restrictinfo->mergeopfamilies == NIL)
-					continue;
-				update_mergeclause_eclasses(root, restrictinfo);
-
-				if (pathkey->pk_eclass == restrictinfo->left_ec ||
-					pathkey->pk_eclass == restrictinfo->right_ec)
-				{
-					matched = true;
-					break;
-				}
-			}
-		}
+		matched = ec_useful_for_merging(root, rel, pathkey->pk_eclass);
 
 		/*
 		 * If we didn't find a mergeclause, we're done --- any additional
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index f1d111063c..a29f616423 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -143,6 +143,8 @@ extern EquivalenceMember *find_computable_ec_member(PlannerInfo *root,
 													List *exprs,
 													Relids relids,
 													bool require_parallel_safe);
+extern bool ec_useful_for_merging(PlannerInfo *root, RelOptInfo *rel,
+								  EquivalenceClass *ec);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern bool relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
 										 EquivalenceClass *ec,
-- 
2.21.0

v4-0003-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patchapplication/octet-stream; name=v4-0003-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patchDownload
From 867dbc9e866ce149b07ff55631598e0843046b61 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sat, 3 Jul 2021 21:06:32 +0800
Subject: [PATCH v4 3/6] add the not null attrs for RelOptInfo. Here is how it
 works.

For baserel, it records the notnull attrs as a bitmapset and store it to
RelOptInfo->notnull_attrs[0].  As for the joinrel, suppose the relids is {1,3,
5}, then the notnull_attrs[1/3/5] will be used to store notnull_attrs bitmapset
for relation 1,3,5 separately. I don't handle this stuff for all kinds of upper
relation and subquery so far since it doesn't pass the design review yet.
---
 notnulltest.sql                       | 24 ++++++++
 src/backend/nodes/bitmapset.c         | 14 +++++
 src/backend/optimizer/path/allpaths.c | 36 ++++++++++++
 src/backend/optimizer/util/plancat.c  |  9 +++
 src/backend/optimizer/util/relnode.c  | 80 +++++++++++++++++++++++++++
 src/include/nodes/bitmapset.h         |  1 +
 src/include/nodes/pathnodes.h         |  6 ++
 7 files changed, 170 insertions(+)
 create mode 100644 notnulltest.sql

diff --git a/notnulltest.sql b/notnulltest.sql
new file mode 100644
index 0000000000..2a36bd0c7f
--- /dev/null
+++ b/notnulltest.sql
@@ -0,0 +1,24 @@
+create table t1(a int, b int not null, c int, d int);
+create table t2(a int, b int not null, c int, d int);
+
+-- single rel
+select * from t1;
+select * from t1 where a > 1;
+select * from t2 where a > 1 or c > 1;
+
+-- partitioned relation.
+create table p (a int, b int, c int not null) partition by range(a);
+create table p_1 partition of p for values from (0) to (10000) partition by list(b);
+create table p_1_1(b int,  c int not null, a int);
+alter table p_1 attach partition p_1_1 for values in (1);
+
+
+select * from p;
+select * from p where a > 1;
+
+
+-- test join:
+select * from t1, t2 where t1.a = t2.c;
+select t1.a, t2.b, t2.c from t1 left join t2 on t1.a = t2.c;
+select * from t1 full join t2 on t1.a = t2.c;
+
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 649478b0d4..fa71f36aaf 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -663,6 +663,20 @@ bms_num_members(const Bitmapset *a)
 	return result;
 }
 
+/*
+ * bms_max_member - the max member in this bitmap.
+ */
+int
+bms_max_member(const Bitmapset *a)
+{
+	int result;
+	if (a == NULL || bms_is_empty(a))
+		elog(ERROR, "Must be an non-empty bitmapset.");
+	result = (a->nwords - 1) * BITS_PER_BITMAPWORD;
+	result += bmw_leftmost_one_pos(a->words[a->nwords - 1]);
+	return result;
+}
+
 /*
  * bms_membership - does a set have zero, one, or multiple members?
  *
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 296dd75c1b..a63a3dfe00 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -354,6 +354,40 @@ set_base_rel_pathlists(PlannerInfo *root)
 	}
 }
 
+/*
+ * set_baserel_notnull_attrs
+ *
+ *	Set baserel's notnullattrs based on baserestrictinfo
+ */
+static void
+set_baserel_notnull_attrs(RelOptInfo *rel)
+{
+	List *clauses = extract_actual_clauses(rel->baserestrictinfo, false);
+	ListCell	*lc;
+	foreach(lc, find_nonnullable_vars((Node *)clauses))
+	{
+		Var *var = (Var *) lfirst(lc);
+		if (var->varno != rel->relid)
+		{
+			/* Lateral Join */
+			continue;
+		}
+		Assert(var->varno == rel->relid);
+		rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+											   var->varattno - FirstLowInvalidHeapAttributeNumber);
+	}
+
+	/* Debug Only, Will be removed at last. */
+	if (false)
+	{
+		elog(INFO, "FirstLowInvalidHeapAttributeNumber = %d, BaseRel(%d), notnull_attrs = %s",
+			 FirstLowInvalidHeapAttributeNumber,
+			 rel->relid,
+			 bmsToString(rel->notnull_attrs[0])
+			);
+	}
+}
+
 /*
  * set_rel_size
  *	  Set size estimates for a base relation
@@ -457,6 +491,8 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	set_baserel_notnull_attrs(rel);
+
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
 	 */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c5194fdbbf..7d3b40090e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int		i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -471,6 +472,14 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	for (i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+												   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 47769cea45..f017a7d52d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
 
 #include <limits.h>
 
+#include "access/sysattr.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
@@ -259,6 +260,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->all_partrels = NULL;
 	rel->partexprs = NULL;
 	rel->nullable_partexprs = NULL;
+	rel->notnull_attrs = palloc0(sizeof(Bitmapset *) * 1);
 
 	/*
 	 * Pass assorted information down the inheritance hierarchy.
@@ -557,6 +559,81 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
 	}
 }
 
+static void
+copy_notnull_attrs_to_joinrel(RelOptInfo *joinrel, RelOptInfo *rel)
+{
+	int relid;
+	if (bms_get_singleton_member(rel->relids, &relid))
+		joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[0]);
+	else
+	{
+		relid = -1;
+		while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+			joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[relid]);
+	}
+}
+
+/*
+ *
+ */
+static void
+set_joinrel_notnull_attrs(RelOptInfo *joinrel,
+						  RelOptInfo *outer_rel,
+						  RelOptInfo *inner_rel,
+						  List *restrictlist,
+						  SpecialJoinInfo *sjinfo)
+{
+	if (sjinfo->jointype == JOIN_FULL)
+		/* Both sides are nullable. */
+		return;
+	/* If it is not FULL join, the outer side is not changed. */
+	copy_notnull_attrs_to_joinrel(joinrel, outer_rel);
+	switch(sjinfo->jointype)
+	{
+		case JOIN_ANTI:
+		case JOIN_SEMI:
+		case JOIN_INNER:
+			copy_notnull_attrs_to_joinrel(joinrel, inner_rel);
+			{
+				ListCell	*lc;
+				List *clauses = extract_actual_clauses(restrictlist, false);
+				foreach(lc, find_nonnullable_vars((Node *) clauses))
+				{
+					Var *var = lfirst_node(Var, lc);
+					if (!bms_is_member(var->varno, joinrel->relids))
+					{
+						/* lateral join */
+						continue;
+					}
+					joinrel->notnull_attrs[var->varno] = bms_add_member(
+						joinrel->notnull_attrs[var->varno],
+						var->varattno - FirstLowInvalidHeapAttributeNumber);
+				}
+			}
+			break;
+		case JOIN_LEFT:
+			break;
+		default:
+			elog(ERROR, "Unexpected join type %d", sjinfo->jointype);
+	}
+	/* Debug Only, will be removed at last. */
+	if (false)
+	{
+		int relid = -1;
+		int eLevel = INFO;
+		elog(eLevel, "Dump notnull for JoinRel(%s)", bmsToString(joinrel->relids));
+		while((relid = bms_next_member(joinrel->relids, relid)) >= 0)
+		{
+			Bitmapset *notnullattrs = joinrel->notnull_attrs[relid];
+			if (notnullattrs != NULL)
+				elog(eLevel, "FirstLowInvalidHeapAttributeNumber = %d, RELID = (%d), notnull_attrs: %s",
+					 FirstLowInvalidHeapAttributeNumber,
+					 relid,
+					 bmsToString(notnullattrs));
+		}
+	}
+
+}
 /*
  * build_join_rel
  *	  Returns relation entry corresponding to the union of two given rels,
@@ -674,6 +751,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->all_partrels = NULL;
 	joinrel->partexprs = NULL;
 	joinrel->nullable_partexprs = NULL;
+	joinrel->notnull_attrs = palloc0(sizeof(Bitmapset *) * (bms_max_member(joinrel->relids) + 1));
 
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -765,6 +843,8 @@ build_join_rel(PlannerInfo *root,
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	set_joinrel_notnull_attrs(joinrel, outer_rel, inner_rel, restrictlist, sjinfo);
+
 	return joinrel;
 }
 
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 1fd12de698..303873a546 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -94,6 +94,7 @@ extern bool bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b);
 extern int	bms_singleton_member(const Bitmapset *a);
 extern bool bms_get_singleton_member(const Bitmapset *a, int *member);
 extern int	bms_num_members(const Bitmapset *a);
+extern int	bms_max_member(const Bitmapset *a);
 
 /* optimized tests when we don't need to know exact membership count: */
 extern BMS_Membership bms_membership(const Bitmapset *a);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6e068f2c8b..7bf1896e12 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -686,6 +686,12 @@ typedef struct RelOptInfo
 	/* default result targetlist for Paths scanning this relation */
 	struct PathTarget *reltarget;	/* list of Vars/Exprs, cost, width */
 
+	Bitmapset	**notnull_attrs; /* The attno which is not null after evalating
+								  * all the quals on this relation, for baserel,
+								  * the len would always 1. and for others the array
+								  * index is relid from relids.
+								  */
+
 	/* materialization information */
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
-- 
2.21.0

v4-0006-Maintain-the-UniqueKey-on-Subquery-and-UpperRel-l.patchapplication/octet-stream; name=v4-0006-Maintain-the-UniqueKey-on-Subquery-and-UpperRel-l.patchDownload
From 54b2f71215c864aa96f7424a7fb57d984de8c7a3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 9 Aug 2021 19:46:13 +0800
Subject: [PATCH v4 6/6] Maintain the UniqueKey on Subquery and UpperRel level.

---
 src/backend/optimizer/path/allpaths.c   |   2 +
 src/backend/optimizer/path/pathkeys.c   |   3 +-
 src/backend/optimizer/path/uniquekey.c  | 179 ++++++++++++++++++++++++
 src/backend/optimizer/plan/planner.c    |  17 ++-
 src/include/optimizer/paths.h           |   6 +
 src/test/regress/expected/uniquekey.out |  36 +++++
 src/test/regress/sql/uniquekey.sql      |  21 +++
 7 files changed, 261 insertions(+), 3 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 343253d694..9225bf738e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2288,6 +2288,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		return;
 	}
 
+	populate_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/*
 	 * Mark rel with estimated output rows, width, etc.  Note that we have to
 	 * do this before generating outer-query paths, else cost_subqueryscan is
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 21f4f2d4bc..9a8a5b654c 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -33,7 +33,6 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
 											 int partkeycol);
-static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
@@ -1035,7 +1034,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
  * We need this to ensure that we don't return pathkeys describing values
  * that are unavailable above the level of the subquery scan.
  */
-static Var *
+Var *
 find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
index 815ade02c3..a073563c59 100644
--- a/src/backend/optimizer/path/uniquekey.c
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -47,6 +47,13 @@ static void populate_joinrel_composite_uniquekey(PlannerInfo *root,
 												 bool outeruk_still_valid,
 												 bool inneruk_still_valid);
 
+static void convert_subquery_uniquekey(PlannerInfo *root, RelOptInfo *rel, UniqueKey *sub_ukey);
+static EquivalenceClass * find_outer_ec_with_subquery_em(PlannerInfo *root, RelOptInfo *rel,
+														 EquivalenceClass *sub_ec,
+														 EquivalenceMember *sub_em);
+static List *convert_subquery_eclass_list(PlannerInfo *root, RelOptInfo *rel,
+										  List *sub_eclass_list);
+
 /* UniqueKey is subset of .. */
 static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey *ukey,
 								  List *ecs, Relids relids);
@@ -185,6 +192,57 @@ populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
 	return;
 }
 
+/*
+ * populate_subquery_uniquekeys
+ *
+ * 'rel': outer query's RelOptInfo for the subquery relation.
+ * 'subquery_uniquekeys': the subquery's output pathkeys, in its terms.
+ * 'subquery_tlist': the subquery's output targetlist, in its terms.
+ *
+ *  subquery issues: a). tlist mapping.  b). interesting uniquekey. c). not nulls.
+ */
+void
+populate_subquery_uniquekeys(PlannerInfo *root, RelOptInfo *rel, RelOptInfo *sub_final_rel)
+{
+	List	*sub_uniquekeys = sub_final_rel->uniquekeys;
+	ListCell	*lc;
+	foreach(lc, sub_uniquekeys)
+	{
+		UniqueKey *sub_ukey = lfirst_node(UniqueKey, lc);
+		convert_subquery_uniquekey(root, rel, sub_ukey);
+	}
+}
+
+/*
+ * populate_uniquekeys_from_pathkeys
+ *
+ */
+void
+populate_uniquekeys_from_pathkeys(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
+{
+	ListCell *lc;
+	List	*unique_exprs = NIL;
+	if (pathkeys == NIL)
+		return;
+	foreach(lc, pathkeys)
+	{
+		PathKey *pathkey = lfirst(lc);
+		unique_exprs = lappend(unique_exprs, pathkey->pk_eclass);
+	}
+	rel->uniquekeys = list_make1(
+		make_uniquekey(bms_make_singleton(list_length(root->unique_exprs)),
+					   false,
+					   true));
+	root->unique_exprs = lappend(root->unique_exprs, unique_exprs);
+}
+
+
+void
+simple_copy_uniquekeys(RelOptInfo *tarrel, RelOptInfo *srcrel)
+{
+	tarrel->uniquekeys = srcrel->uniquekeys;
+}
+
 /*
  * relation_is_distinct_for
  *		Check if the relation is distinct for.
@@ -712,6 +770,127 @@ is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey *ukey,
 	return true;
 }
 
+/*
+ * find_outer_ec_with_subquery_em
+ *
+ *	Given a em in subquery, return the related EquivalenceClass outside.
+ */
+static EquivalenceClass *
+find_outer_ec_with_subquery_em(PlannerInfo *root, RelOptInfo *rel,
+							   EquivalenceClass *sub_ec, EquivalenceMember *sub_em)
+{
+	TargetEntry *sub_tle;
+	Var *outer_var;
+	EquivalenceClass *outer_ec;
+
+	sub_tle = get_tle_from_expr(sub_em->em_expr, rel->subroot->processed_tlist);
+
+	if (!sub_tle)
+		return NULL;
+
+	outer_var = find_var_for_subquery_tle(rel, sub_tle);
+	if (!outer_var)
+		return NULL;
+
+	outer_ec = get_eclass_for_sort_expr(root,
+										(Expr *)outer_var,
+										NULL,
+										sub_ec->ec_opfamilies,
+										sub_em->em_datatype,
+										sub_ec->ec_collation,
+										0,
+										rel->relids,
+										false);
+	return outer_ec;
+}
+
+
+/*
+ * convert_subquery_eclass_list
+ *
+ *		Given a list of eclass in subquery, find the corresponding eclass in outer side.
+ * return NULL if no related eclass outside is found for any eclass in subquery.
+ */
+static List *
+convert_subquery_eclass_list(PlannerInfo *root, RelOptInfo *rel, List *sub_eclass_list)
+{
+	ListCell	*lc;
+	List	*ec_list = NIL;
+	foreach(lc, sub_eclass_list)
+	{
+		EquivalenceClass *sub_ec = lfirst_node(EquivalenceClass, lc);
+		EquivalenceClass *ec = NULL;
+		ListCell	*emc;
+		foreach(emc, sub_ec->ec_members)
+		{
+			EquivalenceMember *sub_em = lfirst(emc);
+			if ((ec = find_outer_ec_with_subquery_em(root, rel, sub_ec, sub_em)) != NULL)
+				break;
+		}
+		if (!ec)
+			return NIL;
+		ec_list = lappend(ec_list, ec);
+	}
+	return ec_list;
+}
+
+
+/*
+ * convert_subquery_uniquekey
+ *
+ */
+static void
+convert_subquery_uniquekey(PlannerInfo *root, RelOptInfo *rel, UniqueKey *sub_ukey)
+{
+	PlannerInfo *sub_root = rel->subroot;
+	List	*unique_exprs_list = NIL;
+	Bitmapset	*unique_exprs_indexes = NULL;
+	UniqueKey	*ukey = NULL;
+	int i = -1;
+	ListCell	*lc;
+	while((i = bms_next_member(sub_ukey->unique_expr_indexes, i)) >= 0)
+	{
+		Node *sub_eq_list = list_nth(sub_root->unique_exprs, i);
+		if (IsA(sub_eq_list, SingleRow))
+		{
+			/*
+			 * TODO: Unclear what to do, don't think it hard before the overall
+			 * design is accepted.
+			 */
+			return;
+		}
+		else
+		{
+			List *upper_eq_list;
+			Assert(IsA(sub_eq_list, List));
+			/*
+			 * Note: upper_eq_list is just part of uniquekey's exprs, to covert the whole
+			 * UniqueKey, we needs all the parts are shown in the upper rel.
+			 */
+			upper_eq_list = convert_subquery_eclass_list(root, rel, (List *)sub_eq_list);
+			if (upper_eq_list == NIL)
+			{
+				if (unique_exprs_list != NIL)
+					pfree(unique_exprs_list);
+				return;
+			}
+			unique_exprs_list = lappend(unique_exprs_list, upper_eq_list);
+		}
+	}
+
+	foreach(lc, unique_exprs_list)
+	{
+		unique_exprs_indexes = bms_add_member(unique_exprs_indexes, list_length(root->unique_exprs));
+		root->unique_exprs = lappend(root->unique_exprs, lfirst(lc));
+	}
+
+	ukey = make_uniquekey(unique_exprs_indexes,
+						  sub_ukey->multi_nulls,
+						  /* TODO: need check again, case SELECT * FROM (SELECT u FROM x OFFSET 0) v where x.u = 0; */
+						  true);
+	rel->uniquekeys = lappend(rel->uniquekeys, ukey);
+}
+
 /*
  *	make_uniquekey
  */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ed2cff00fc..a268737a4c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1650,7 +1650,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	 * Now we are prepared to build the final-output upperrel.
 	 */
 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
-
+	simple_copy_uniquekeys(final_rel, current_rel);
 	/*
 	 * If the input rel is marked consider_parallel and there's nothing that's
 	 * not parallel-safe in the LIMIT clause, then the final_rel can be marked
@@ -3622,6 +3622,19 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 									  gd,
 									  extra->targetList);
 
+	if (root->parse->groupingSets)
+	{
+		/* nothing to do */
+	}
+	else if (root->parse->groupClause && root->group_pathkeys != NIL)
+	{
+		populate_uniquekeys_from_pathkeys(root, grouped_rel, root->group_pathkeys);
+	}
+	else
+	{
+		/* SingleRow Case */
+	}
+
 	/* Build final grouping paths */
 	add_paths_to_grouping_rel(root, input_rel, grouped_rel,
 							  partially_grouped_rel, agg_costs, gd,
@@ -4251,6 +4264,8 @@ create_distinct_paths(PlannerInfo *root,
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
 
+	populate_uniquekeys_from_pathkeys(root, distinct_rel, root->distinct_pathkeys);
+
 	/*
 	 * We don't compute anything at this level, so distinct_rel will be
 	 * parallel-safe if the input rel is parallel-safe.  In particular, if
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index f233837e59..b9570807af 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -265,11 +265,17 @@ extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 extern void populate_baserel_uniquekeys(PlannerInfo *root,
 										RelOptInfo *baserel);
+extern Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 extern void populate_baserel_uniquekeys(PlannerInfo *root,
 										RelOptInfo *baserel);
 extern void populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
 										RelOptInfo *outerrel, RelOptInfo *innerrel,
 										List *restrictlist, JoinType jointype);
+extern void populate_uniquekeys_from_pathkeys(PlannerInfo *root, RelOptInfo *rel,
+											  List *pathkeys);
+extern void populate_subquery_uniquekeys(PlannerInfo *root, RelOptInfo *rel,
+										 RelOptInfo *sub_final_rel);
+extern void simple_copy_uniquekeys(RelOptInfo *srcrel, RelOptInfo *tarrel);
 extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 									 List *distinct_pathkey);
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/uniquekey.out b/src/test/regress/expected/uniquekey.out
index c2bd1fa619..a86e413fcf 100644
--- a/src/test/regress/expected/uniquekey.out
+++ b/src/test/regress/expected/uniquekey.out
@@ -407,3 +407,39 @@ EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND
                ->  Seq Scan on uqk2
 (8 rows)
 
+-----------------------------------------
+-- Test DISTINCT/GROUP BY CASE.
+-----------------------------------------
+--------------------------------------------------------------------------------------------
+-- Test subquery cases.
+-- Note that current the UniqueKey still not push down the interesting UniqueKey to subquery.
+-- like uniquekey, so the below test case need a "DISTINCT" in subquery to make sure the
+-- UniqueKey is maintain.
+--------------------------------------------------------------------------------------------
+-- Test a normal case - one side
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk AND uqk1.c is not null offset 0) v;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Hash Join  (cost=1.07..2.14 rows=2 width=8)
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1  (cost=0.00..1.05 rows=3 width=12)
+         Filter: (c IS NOT NULL)
+   ->  Hash  (cost=1.03..1.03 rows=3 width=4)
+         ->  Seq Scan on uqk2  (cost=0.00..1.03 rows=3 width=4)
+(6 rows)
+
+-- Test a normal case - composited side.
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null OFFSET 0)
+v;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Nested Loop  (cost=0.00..2.20 rows=9 width=12)
+   ->  Seq Scan on uqk1 t1  (cost=0.00..1.05 rows=3 width=8)
+         Filter: (c IS NOT NULL)
+   ->  Materialize  (cost=0.00..1.04 rows=3 width=4)
+         ->  Seq Scan on uqk2 t2  (cost=0.00..1.03 rows=3 width=4)
+(5 rows)
+
diff --git a/src/test/regress/sql/uniquekey.sql b/src/test/regress/sql/uniquekey.sql
index 3f93872246..22e236eafb 100644
--- a/src/test/regress/sql/uniquekey.sql
+++ b/src/test/regress/sql/uniquekey.sql
@@ -106,3 +106,24 @@ SELECT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY
 -----------------------------------------
 EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1, uqk2 WHERE uqk1.c = uqk2.c;
 EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 1 AND uqk1.c = uqk2.c;
+
+-----------------------------------------
+-- Test DISTINCT/GROUP BY CASE.
+-----------------------------------------
+
+
+--------------------------------------------------------------------------------------------
+-- Test subquery cases.
+-- Note that current the UniqueKey still not push down the interesting UniqueKey to subquery.
+-- like uniquekey, so the below test case need a "DISTINCT" in subquery to make sure the
+-- UniqueKey is maintain.
+--------------------------------------------------------------------------------------------
+-- Test a normal case - one side
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.a = uqk2.pk AND uqk1.c is not null offset 0) v;
+
+-- Test a normal case - composited side.
+EXPLAIN SELECT DISTINCT v.* FROM
+(SELECT DISTINCT t1.c, t1.d, t2.pk FROM uqk1 t1 cross join uqk2 t2 where t1.c is not null OFFSET 0)
+v;
-- 
2.21.0

#35Andy Fan
zhihui.fan1213@gmail.com
In reply to: Tom Lane (#23)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Tue, Jul 6, 2021 at 9:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

Tom, I'm wondering if you might get a chance to draw up a design for
what you've got in mind with this? I assume adding a new field in
Var, but I'm drawing a few blanks on how things might work for equal()
when one Var has the field set and another does not.

As I said before, it hasn't progressed much past the handwaving stage,
but it does seem like it's time to get it done. I doubt I'll have any
cycles for it during the commitfest, but maybe I can devote a block of
time during August.

regards, tom lane

Hi Tom: do you get a chance to work on this? Looks like we have to fix
this one before we can move on to the uniquekey stuff.

--
Best Regards
Andy Fan

#36Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#35)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Unknown why people have so little interest in this, AFAICS, there are more great
usages of UniqueKey rather than the 'marking-distinct-as-noop'. The most
exciting usage for me is it is helpful for JoinRel's pathkey.

Take an example of this:

SELECT .. FROM t1 JOIN t2 ON t1.any_column = t2.uniquekey;
SELECT .. FROM t1 LEFT JOIN t2 ON t1.any_column = t2.uniquekey;

Suppose before the join, t1 has a pathkey X, t2 has PathKey y. Then
(t1.X, t2.Y) is
ordered as well for JoinRel(t1, t2). Then the pathkey of JoinRel(t1,
t2) has a lot
of usage again. Currently after the joining, only the outer join's
pathkey is maintained.

As for the extra planning cost of this, it looks like our current
infrastructure can support it
well since we know all the information when we generate the pathkey
for the Join Path.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#37Dmitry Dolgov
9erthalion6@gmail.com
In reply to: David Rowley (#25)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Wed, Jul 07, 2021 at 01:20:24PM +1200, David Rowley wrote:
On Wed, 7 Jul 2021 at 13:04, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Looking forward to watching this change closely, thank you both David and Tom!
But I still don't understand what the faults my way have , do you mind telling the
details?

The problem is that we don't need 6 different ways to determine if a
Var can be NULL or not. You're proposing to add a method using
Bitmapsets and Tom has some proposing ideas around tracking
nullability in Vars. We don't need both.

It seems to me that having it in Var allows us to have a much finer
gradient about where exactly a Var can be NULL.

For example: SELECT nullablecol FROM tab WHERE nullablecol = <value>;

If the equality operator is strict then the nullablecol can be NULL in
the WHERE clause but not in the SELECT list. Tom's idea should allow
us to determine both of those things but your idea cannot tell them
apart, so, in theory at least, Tom's idea seems better to me.

Hi,

This patch still occupies some place in my head, so I would like to ask few
questions to see where it's going:

* From the last emails in this thread I gather that the main obstacle from the
design side of things is functionality around figuring out if a Var could be
NULL or not, and everyone is waiting for a counterproposal about how to do
that better. Is that correct?

* Is this thread only about notnullattrs field in RelOptInfo, or about the
UniqueKeys patch series after all? The title indicates the first one, but the
last posted patch series included everything as far as I can see.

* Putting my archaeologist's hat on, I've tried to find out what this
alternative proposal was about. The result findings are scattered through the
archives -- which proves that it's a hard topic indeed -- and participants of
this thread are probably more aware about them than I am. The most detailed
handwaving I found in the thread [1]/messages/by-id/25142.1580847861@sss.pgh.pa.us, with an idea to introduce NullableVar
wrapper created by parser, is that it? It makes more clear why such approach
could be more beneficial than a new field in RelOptInfo. And if the thread is
only about the notnullattrs, I guess it would be indeed enough to object.

* Now, how essential is notnullattrs functionality for the UniqueKeys patch
series? From what I understand, it's being used to set multi_nulls field of
every UniqueKey to indicate whether this key could produce NULL or not (which
means no guaranties about uniqueness could be provided). Is there a way to
limit the scope of the patch series and introduce UniqueKeys without require
multi_nulls at all, or (again, in some limited situations) fetch necessary
information somehow on the fly e.g. only from catcache without introducing
any new infrastructure?

[1]: /messages/by-id/25142.1580847861@sss.pgh.pa.us

#38Andy Fan
zhihui.fan1213@gmail.com
In reply to: Dmitry Dolgov (#37)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Hi Dmitry:

On Wed, Nov 17, 2021 at 11:20 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Wed, Jul 07, 2021 at 01:20:24PM +1200, David Rowley wrote:
On Wed, 7 Jul 2021 at 13:04, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Looking forward to watching this change closely, thank you both David and Tom!
But I still don't understand what the faults my way have , do you mind telling the
details?

The problem is that we don't need 6 different ways to determine if a
Var can be NULL or not. You're proposing to add a method using
Bitmapsets and Tom has some proposing ideas around tracking
nullability in Vars. We don't need both.

It seems to me that having it in Var allows us to have a much finer
gradient about where exactly a Var can be NULL.

For example: SELECT nullablecol FROM tab WHERE nullablecol = <value>;

If the equality operator is strict then the nullablecol can be NULL in
the WHERE clause but not in the SELECT list. Tom's idea should allow
us to determine both of those things but your idea cannot tell them
apart, so, in theory at least, Tom's idea seems better to me.

Hi,

This patch still occupies some place in my head, so I would like to ask few
questions to see where it's going:

Thanks for that!

* From the last emails in this thread I gather that the main obstacle from the
design side of things is functionality around figuring out if a Var could be
NULL or not, and everyone is waiting for a counterproposal about how to do
that better. Is that correct?

That is correct.

* Is this thread only about notnullattrs field in RelOptInfo, or about the
UniqueKeys patch series after all? The title indicates the first one, but the
last posted patch series included everything as far as I can see.

This thread is talking about the path series after all. Not null maintenance
is the first step of the UniqueKey patch. If the not null issue can't
be addressed,
the overall UniqueKey patch would be hopeless.

* Putting my archaeologist's hat on, I've tried to find out what this
alternative proposal was about. The result findings are scattered through the
archives -- which proves that it's a hard topic indeed -- and participants of
this thread are probably more aware about them than I am. The most detailed
handwaving I found in the thread [1], with an idea to introduce NullableVar
wrapper created by parser, is that it? It makes more clear why such approach
could be more beneficial than a new field in RelOptInfo. And if the thread is
only about the notnullattrs, I guess it would be indeed enough to object.

* Now, how essential is notnullattrs functionality for the UniqueKeys patch
series? From what I understand, it's being used to set multi_nulls field of
every UniqueKey to indicate whether this key could produce NULL or not (which
means no guaranties about uniqueness could be provided). Is there a way to
limit the scope of the patch series and introduce UniqueKeys without require
multi_nulls at all, or (again, in some limited situations) fetch necessary
information somehow on the fly e.g. only from catcache without introducing
any new infrastructure?

_I_ think there is no way to bypass that. I guess Tom has a bigger plan on
Var (not only for notnull), but no time to invest in them so far. If
that is the case,
personally I think we can go ahead with my method first and continue the review
process.

--
Best Regards
Andy Fan

#39Greg Stark
stark@mit.edu
In reply to: Andy Fan (#38)
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

On Mon, 22 Nov 2021 at 02:14, Andy Fan <zhihui.fan1213@gmail.com> wrote:

_I_ think there is no way to bypass that. I guess Tom has a bigger plan on
Var (not only for notnull), but no time to invest in them so far. If
that is the case,
personally I think we can go ahead with my method first and continue the review
process.

This discussion has gone on for two years now and meandered into
different directions. There have been a number of interesting
proposals and patches in that time but it's not clear now what patch
is even under consideration and what questions remain for it. And I
think this message from last November is the last comment on it so I
wonder if it's reached a bit of an impasse.

I think I would suggest starting a fresh thread with a patch distilled
from the previous discussions. Then once that's settled repeat with
additional patches, keeping the discussion focused just on the current
change.

Personally I think these kinds of optimizations are important because
they're what allow people to use SQL without micro-optimizing each
query.

--
greg