Tracking notnull attributes inside Var

Started by Andy Fanover 3 years ago6 messages

zhihui.fan1213@gmail.com

over 3 years ago

1 attachment(s)

notnulls discussion is forked from UniqueKey stuff, you can see the
attachment
for the UnqiueKey introduction. Tom raised his opinion to track the
nullability
inside Var[1]/messages/by-id/1551312.1613142245@sss.pgh.pa.us[2]/messages/by-id/CAApHDvrRwhWCPKUD5H-EQoezHf=fnUUsPgTAnXsEOV8f8SF7XQ@mail.gmail.com[3]/messages/by-id/1664320.1625577290@sss.pgh.pa.us, this thread would start from there based on my
understanding.

Generally tracking the null attributes inside Var would have something like:

struct Var
{
...;
int nullable; // -1 unknown, 0 - not nullable. 1 - nullable
};

and then semantics of Var->nullable must be attached to a RelOptInfo. For
example:

CREATE TABLE t1(a int, b int);

SELECT abs(a) FROM t1 WHERE a > -100;

The var in RelOptInfo->reltarget should have nullable = 0 but the var in
RelOptInfo->baserestrictinfo should have nullable = 1; The beauty of this
are: a). It can distinguish the two situations perfectly b). Whenever we
want
to know the nullable attribute of a Var for an expression, it is super easy
to
know. In summary, we need to maintain the nullable attribute at 2 different
places. one is the before the filters are executed(baserestrictinfo,
joininfo,
ec_list at least). one is after the filters are executed
(RelOptInfo.reltarget
only?)

Come to JoinRel, we still need to maintain the 2 different cases as well.

As for the joinrel.reltarget, currently it looks up the inputrel's
reltarget to
get the Var, so it is easy to inherit from Var->nullable from inputrel, but
we need to consider the new changes introduced by current join,
Like new NOT nullable attributes because of join clauses OR new nullable
attributes because of outer join. Everything looks good for now.

The hard part is RelOptInfo.joininfo & root->eq_classes. All of them uses
the shared RestrictInfo, and it is unclear which Var->nullable should be
used in
them. To not provide a wrong answer, I think we can assume nullable=-1
(unknown)
and let the upper layer decides what to do (do we have known use cases to
use
the nullable attribute here?).

More considerations about this strategy:
1. We might use more memory for different var copies, the only known cases
RelOptInfo->reltarget for now.
2. _equalVar() has more complex semantics: shall we consider nulls or not.

My recent experience reminds me of another interesting use case of UniqueKey
which may reduce the planning time a lot IIUC (Value 3 in then attachment).
Since
PG15 has just been released, I wonder if more people have time to discuss
this topic
again. Do I think the way in the right direction?

[1]: /messages/by-id/1551312.1613142245@sss.pgh.pa.us
[2]: /messages/by-id/CAApHDvrRwhWCPKUD5H-EQoezHf=fnUUsPgTAnXsEOV8f8SF7XQ@mail.gmail.com
/messages/by-id/CAApHDvrRwhWCPKUD5H-EQoezHf=fnUUsPgTAnXsEOV8f8SF7XQ@mail.gmail.com
[3]: /messages/by-id/1664320.1625577290@sss.pgh.pa.us

--
Best Regards
Andy Fan

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 3 years ago

In reply to: Andy Fan (#1)

Re: Tracking notnull attributes inside Var

On Sun, May 15, 2022 at 8:41 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

The var in RelOptInfo->reltarget should have nullable = 0 but the var in
RelOptInfo->baserestrictinfo should have nullable = 1; The beauty of this
are: a). It can distinguish the two situations perfectly b). Whenever we want
to know the nullable attribute of a Var for an expression, it is super easy to
know. In summary, we need to maintain the nullable attribute at 2 different
places. one is the before the filters are executed(baserestrictinfo, joininfo,
ec_list at least). one is after the filters are executed (RelOptInfo.reltarget
only?)

Thanks for identifying this. What you have written makes sense and it
might open a few optimization opportunities. But let me put down some
other thoughts here. You might want to take those into consideration
when designing your solution.

Do we want to just track nullable and non-nullable. May be we want
expand this class to nullable (var may be null), non-nullable (Var is
definitely non-NULL), null (Var will be always NULL).

But the other way to look at this is along the lines of equivalence
classes. Equivalence classes record the expressions which are equal in
the final result of the query. The equivalence class members are not
equal at all the stages of query execution. But because they are
equal in the final result, we can impose that restriction on the lower
levels as well. Can we think of nullable in that fashion? If a Var is
non-nullable in the final result, we can impose that restriction on
the intermediate stages since rows with NULL values for that Var will
be filtered out somewhere. Similarly we could argue for null Var. But
knowledge that a Var is nullable in the final result does not impose a
NULL, non-NULL restriction on the intermediate stages. If we follow
this thought process, we don't need to differentiate Var at different
stages in query.

Come to JoinRel, we still need to maintain the 2 different cases as well.

As for the joinrel.reltarget, currently it looks up the inputrel's reltarget to
get the Var, so it is easy to inherit from Var->nullable from inputrel, but
we need to consider the new changes introduced by current join,
Like new NOT nullable attributes because of join clauses OR new nullable
attributes because of outer join. Everything looks good for now.

Yes, if we want to maintain nullness at different stages in the query.

The hard part is RelOptInfo.joininfo & root->eq_classes. All of them uses
the shared RestrictInfo, and it is unclear which Var->nullable should be used in
them. To not provide a wrong answer, I think we can assume nullable=-1 (unknown)
and let the upper layer decides what to do (do we have known use cases to use
the nullable attribute here?).

I think what applies to baserestrictinfo and reltarget also applies to
joininfo and join's reltarget. There will be three stages - join
clauses, join quals and reltarget.

In EQs the Vars in RestrictInfo will come from joininfo but EQ member
Vars will derive their nullable-ness from corresponding reltarget. I
can be wrong though.

More considerations about this strategy:
1. We might use more memory for different var copies, the only known cases
RelOptInfo->reltarget for now.

When a Var is copied the whole expression tree needs to be copied.
That might be more memory than just copies of Var nodes.

2. _equalVar() has more complex semantics: shall we consider nulls or not.

This is interesting. It might have impact on set_plan_references and
planner's ability to search and match expressions.

But if we take the approach I have suggested earlier, this question
will not arise.

--
Best Wishes,
Ashutosh Bapat

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andy Fan (#1)

Re: Tracking notnull attributes inside Var

Andy Fan <zhihui.fan1213@gmail.com> writes:

notnulls discussion is forked from UniqueKey stuff, you can see the
attachment
for the UnqiueKey introduction. Tom raised his opinion to track the
nullability
inside Var[1][2][3], this thread would start from there based on my
understanding.

I'm pretty certain that I never suggested this:

struct Var
{
...;
int nullable; // -1 unknown, 0 - not nullable. 1 - nullable
};

You're free to pursue it if you like, but I think it will be a dead end.
The fundamental problem as you note is that equalVar() cannot do anything
sane with a field defined that way. Also, we'd have to generate Vars
initially with nullable = unknown (else, for example, ALTER SET/DROP NOT
NULL breaks stored views referring to the column). It'd be on the planner
to run through the tree and replace that with "nullable" or "not
nullable". It's hard to see how that's more advantageous than just
keeping the info in the associated RelOptInfo.

Also, I think you're confusing two related but distinct issues. For
certain optimization issues, we'd like to keep track of whether a column
stored in a table is known NOT NULL. However, that's not the same thing
as the question that I've muttered about, which is how to treat a Var
that's been possibly forced to null due to null-extension of an outer
join. That is a different value from the Var as read from the table,
but we currently represent it the same within the planner, which causes
various sorts of undesirable complication. We cannot fix that by setting
Var.nullable = true in above-the-join instances, because it might also
be true in below-the-join instances. "Known not null in the table" is
not the inverse of "potentially nulled by an outer join". Moreover, we
probably need to know *which* join is the one potentially nulling the Var,
so a bool is not likely enough anyway.

The schemes I've been toying with tend to look more like putting a
PlaceHolderVar-ish wrapper around the Var or expression that represents
the below-the-join value. The wrapper node could carry any join ID
info that we find necessary. The thing that I'm kind of stalled on is
how to define this struct so that it's not a big headache for join
strength reduction (which could remove the need for a wrapper altogether)
or outer-join reordering (which makes it a bit harder to define which
join we think is the one nulling the value).

regards, tom lane

Andy Fan

zhihui.fan1213@gmail.com

over 3 years ago

In reply to: Tom Lane (#3)

1 attachment(s)

Re: Tracking notnull attributes inside Var

Hi Tom:

Thanks for your attention!

On Wed, May 18, 2022 at 1:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andy Fan <zhihui.fan1213@gmail.com> writes:

notnulls discussion is forked from UniqueKey stuff, you can see the
attachment
for the UnqiueKey introduction. Tom raised his opinion to track the
nullability
inside Var[1][2][3], this thread would start from there based on my
understanding.

I'm pretty certain that I never suggested this:

struct Var
{
...;
int nullable; // -1 unknown, 0 - not nullable. 1 - nullable
};

You're free to pursue it if you like, but I think it will be a dead end.

OK, Here is a huge misunderstanding. I have my own solution at the
beginning and then I think you want to go with this direction and I think
it is really hard to understand, so I started this thread to make things
clear. It is so great that the gap is filled now.

The fundamental problem as you note is that equalVar() cannot do anything

sane with a field defined that way. Also, we'd have to generate Vars
initially with nullable = unknown (else, for example, ALTER SET/DROP NOT
NULL breaks stored views referring to the column). It'd be on the planner
to run through the tree and replace that with "nullable" or "not
nullable". It's hard to see how that's more advantageous than just
keeping the info in the associated RelOptInfo.

Agreed.

Also, I think you're confusing two related but distinct issues. For
certain optimization issues, we'd like to keep track of whether a column
stored in a table is known NOT NULL. However, that's not the same thing
as the question that I've muttered about, which is how to treat a Var
that's been possibly forced to null due to null-extension of an outer
join. That is a different value from the Var as read from the table,
but we currently represent it the same within the planner, which causes
various sorts of undesirable complication. We cannot fix that by setting

Var.nullable = true in above-the-join instances, because it might also

be true in below-the-join instances. "Known not null in the table" is
not the inverse of "potentially nulled by an outer join". Moreover, we
probably need to know *which* join is the one potentially nulling the Var,
so a bool is not likely enough anyway.

I read the above graph several times, but *I think probably my code can
express better than my words*. It would be great that you can have a
look at them. Just one point to mention now: Seems you didn't mention the
case where the NULL values are filtered by qual, not sure it is negligible
or by mistake.

CREATE TABLE t(a int);
SELECT * FROM t WHERE a > 1;

My patch is my previous solution not the Inside Var one.

The schemes I've been toying with tend to look more like putting a
PlaceHolderVar-ish wrapper around the Var or expression that represents
the below-the-join value. The wrapper node could carry any join ID
info that we find necessary. The thing that I'm kind of stalled on is
how to define this struct so that it's not a big headache for join
strength reduction (which could remove the need for a wrapper altogether)
or outer-join reordering (which makes it a bit harder to define which
join we think is the one nulling the value).

Not sure if the "NULL values are filtered by qual '' matters in this
solution,
and I'm pretty open for direction. But to avoid further misunderstanding
from me, I would like to fill more gaps first by raising my patch now
and continue talking in this direction.

--
Best Regards
Andy Fan

Attachments:

v1-0001-Introduce-notnull_attrs-for-RelOptInfo.patchapplication/octet-stream; name=v1-0001-Introduce-notnull_attrs-for-RelOptInfo.patchDownload

From 2f4706a83869a21ade8c9d818b3c9826a610ea50 Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihui.fan1213@gmail.com>
Date: Fri, 20 May 2022 11:17:52 +0800
Subject: [PATCH v1] Introduce notnull_attrs for RelOptInfo.

RelOptInfo.notnull_attrs records which Vars are not nullable
for the given RelOptInfo after all the quals on this RelOptInfo
are executed.

For a joinrel, the length of notnull_attrs array equals the
(max value + 1) in relids, the array index is varno and the
value is a Bitmapset* which presents which varattnos are
not nullable for the varno.

For baserel, to save space, notnull_attrs[0] is always used.

We calculate the notnull_attrs mainly from 2 places. one is
catalog. which is set during get_relation_info. the other one
is the qual filters, I set them during set_rel_size & set_joinrel_size
to covers as many as relkind as possible. Inherit can be handled
automatically in this way. nulls introduced by outer join
is also handled during build_join_rel.

I assumed upper relation except SETOP would have the same
notnull_attrs as the input relation, like SORT / GROUP
and so on.

SubQuery is handled but SetOp upper relation is not handled so far.

XXX: There are some code like

if (!enable_geqo)
{
elog(INFO, XXXX);
}

They are just there for easily review & testing, will be removed
at last.
---
 src/backend/nodes/bitmapset.c             |  14 ++
 src/backend/optimizer/path/allpaths.c     | 104 +++++++-
 src/backend/optimizer/util/plancat.c      |  12 +
 src/backend/optimizer/util/relnode.c      | 186 +++++++++++++-
 src/include/nodes/bitmapset.h             |   1 +
 src/include/nodes/pathnodes.h             |  14 ++
 src/test/regress/expected/notnulltest.out | 283 ++++++++++++++++++++++
 src/test/regress/parallel_schedule        |   2 +
 src/test/regress/sql/notnulltest.sql      |  87 +++++++
 9 files changed, 697 insertions(+), 6 deletions(-)
 create mode 100644 src/test/regress/expected/notnulltest.out
 create mode 100644 src/test/regress/sql/notnulltest.sql

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 0a6c30e4ebc..8bc8034c3fb 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -665,6 +665,20 @@ bms_num_members(const Bitmapset *a)
 	return result;
 }
 
+/*
+ * bms_max_member - the max member in this bitmap.
+ */
+int
+bms_max_member(const Bitmapset *a)
+{
+	int result;
+	if (a == NULL || bms_is_empty(a))
+		elog(ERROR, "Must be an non-empty bitmapset.");
+	result = (a->nwords - 1) * BITS_PER_BITMAPWORD;
+	result += bmw_leftmost_one_pos(a->words[a->nwords - 1]);
+	return result;
+}
+
 /*
  * bms_membership - does a set have zero, one, or multiple members?
  *
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 7ac116a791f..c40b94a725e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -143,7 +143,8 @@ static void subquery_push_qual(Query *subquery,
 static void recurse_push_qual(Node *setOp, Query *topquery,
 							  RangeTblEntry *rte, Index rti, Node *qual);
 static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
-
+static void set_baserel_notnull_attrs_for_quals(RelOptInfo *rel);
+static void set_subquery_rel_notnull_attrs(RelOptInfo *rel);
 
 /*
  * make_one_rel
@@ -422,6 +423,9 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 				 * and build their paths immediately.
 				 */
 				set_subquery_pathlist(root, rel, rti, rte);
+
+				set_subquery_rel_notnull_attrs(rel);
+
 				break;
 			case RTE_FUNCTION:
 				set_function_size_estimates(root, rel);
@@ -458,6 +462,13 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * Set the notnull attributes for all the kinds of baserel, including
+	 * RTE_RELATION(foreign table, partitioned table), RTE_SUBQUERY etc
+	 * Here doesn't handle the notnull attributes from catalog.
+	 */
+	set_baserel_notnull_attrs_for_quals(rel);
+
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
 	 */
@@ -3270,6 +3281,15 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 		initial_rels = lappend(initial_rels, thisrel);
 	}
 
+	/*
+	 * We put the initial_rels list into a PlannerInfo field because
+	 * has_legal_joinclause() needs to look at it (ugly :-().
+	 *
+	 * XXX: I kept this since I need to access the last join rel
+	 * during set_uppper_rel_notnull_attrs stage.
+	 */
+	root->initial_rels = initial_rels;
+
 	if (levels_needed == 1)
 	{
 		/*
@@ -3282,11 +3302,7 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 		/*
 		 * Consider the different orders in which we could join the rels,
 		 * using a plugin, GEQO, or the regular join search code.
-		 *
-		 * We put the initial_rels list into a PlannerInfo field because
-		 * has_legal_joinclause() needs to look at it (ugly :-().
 		 */
-		root->initial_rels = initial_rels;
 
 		if (join_search_hook)
 			return (*join_search_hook) (root, levels_needed, initial_rels);
@@ -4558,3 +4574,81 @@ debug_print_rel(PlannerInfo *root, RelOptInfo *rel)
 }
 
 #endif							/* OPTIMIZER_DEBUG */
+
+/*
+ * set_baserel_notnull_attrs_for_quals
+ *
+ *	Set baserel's notnullattrs based on baserestrictinfo.
+ */
+static void
+set_baserel_notnull_attrs_for_quals(RelOptInfo *rel)
+{
+	List *clauses = extract_actual_clauses(rel->baserestrictinfo, false);
+	ListCell	*lc;
+	foreach(lc, find_nonnullable_vars((Node *)clauses))
+	{
+		Var *var = (Var *) lfirst(lc);
+		if (var->varno != rel->relid)
+		{
+			/* Lateral Join */
+			continue;
+		}
+		Assert(var->varno == rel->relid);
+		rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+											   var->varattno - FirstLowInvalidHeapAttributeNumber);
+	}
+
+	/* Debug Only, Will be removed at last. */
+	if (!enable_geqo)
+	{
+		elog(INFO, "FirstLowInvalidHeapAttributeNumber = %d, BaseRel(%d), notnull_attrs = %s",
+			 FirstLowInvalidHeapAttributeNumber,
+			 rel->relid,
+			 bmsToString(rel->notnull_attrs[0]));
+	}
+}
+
+/*
+ * set_subquery_rel_notnull_attrs
+ *
+ *	We only maintained Var level notnull_attrs, that means we can only
+ * know the VAR ONLY expr in subroot->processed_tlist is nullable or not.
+ * Build notnull_attrs with the position of these Vars in
+ * subroot->processed_tlist for subquery rel.
+ */
+static void
+set_subquery_rel_notnull_attrs(RelOptInfo *rel)
+{
+	PlannerInfo *subroot = rel->subroot;
+	RelOptInfo *subrel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
+	bool sub_single_rel = subroot->join_rel_list == NIL;
+	int attno = 0;
+	ListCell *lc;
+
+	if (subroot->initial_rels == NIL)
+		return;
+
+	foreach(lc, subroot->processed_tlist)
+	{
+		Var *inner_var;
+		int idx;
+		attno += 1;
+
+		inner_var = (Var *)lfirst_node(TargetEntry, lc)->expr;
+
+		if (!inner_var || !IsA(inner_var, Var))
+			continue;
+
+		/*
+		 * If the subquery is from a single baserel, the Bitmapset's
+		 * index is 0, or else, it is varno.
+		 */
+		idx = sub_single_rel ? 0 : inner_var->varno;
+
+		if (bms_is_member(inner_var->varattno - FirstLowInvalidHeapAttributeNumber,
+						  subrel->notnull_attrs[idx]))
+			/* Set the notnull_attrs for upper subquery_rel. */
+			rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+												   inner_var->varattno - FirstLowInvalidHeapAttributeNumber);
+	}
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 5012bfe1425..4715845c180 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -118,6 +118,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int		i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -472,6 +473,17 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	/*
+	 * Set baserel's notnull_attrs based on catalog.
+	 */
+	for (i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnull_attrs[0] = bms_add_member(rel->notnull_attrs[0],
+												   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba0..302f60c5397 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
 
 #include <limits.h>
 
+#include "access/sysattr.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
@@ -72,7 +73,18 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *childrel,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
-
+static void set_uppper_rel_notnull_attrs(PlannerInfo *root,
+										 RelOptInfo *upper_rel,
+										 UpperRelationKind kind);
+
+static void copy_notnull_attrs_to_joinrel(RelOptInfo *joinrel, RelOptInfo *rel);
+static void set_joinrel_notnull_attrs(RelOptInfo *joinrel,
+									  RelOptInfo *outer_rel,
+									  RelOptInfo *inner_rel,
+									  List *restrictlist,
+									  SpecialJoinInfo *sjinfo);
+static void set_uppper_rel_notnull_attrs(PlannerInfo *root, RelOptInfo *upper_rel,
+										 UpperRelationKind kind);
 
 /*
  * setup_simple_rel_arrays
@@ -259,6 +271,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->all_partrels = NULL;
 	rel->partexprs = NULL;
 	rel->nullable_partexprs = NULL;
+	rel->notnull_attrs = palloc0(sizeof(Bitmapset *) * 1);
 
 	/*
 	 * Pass assorted information down the inheritance hierarchy.
@@ -674,6 +687,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->all_partrels = NULL;
 	joinrel->partexprs = NULL;
 	joinrel->nullable_partexprs = NULL;
+	joinrel->notnull_attrs = palloc0(sizeof(Bitmapset *) * (bms_max_member(joinrel->relids) + 1));
 
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -765,6 +779,8 @@ build_join_rel(PlannerInfo *root,
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	set_joinrel_notnull_attrs(joinrel, outer_rel, inner_rel, restrictlist, sjinfo);
+
 	return joinrel;
 }
 
@@ -857,6 +873,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
 										   inner_rel->top_parent_relids);
 
+	joinrel->notnull_attrs = palloc0(sizeof(Bitmapset *) * (bms_max_member(joinrel->relids) + 1));
+
 	/* Compute information relevant to foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
 
@@ -916,6 +934,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 
 	pfree(appinfos);
 
+	set_joinrel_notnull_attrs(joinrel, outer_rel, inner_rel, restrictlist, sjinfo);
+
 	return joinrel;
 }
 
@@ -1245,6 +1265,8 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
 
 	root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
 
+	set_uppper_rel_notnull_attrs(root, upperrel, kind);
+
 	return upperrel;
 }
 
@@ -2045,3 +2067,165 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+/*
+ * copy_notnull_attrs_to_joinrel
+ *
+ *	Copy the notnull_attrs from rel to joinrel's notnull_attrs.
+ */
+static void
+copy_notnull_attrs_to_joinrel(RelOptInfo *joinrel, RelOptInfo *rel)
+{
+	int relid;
+	if (bms_get_singleton_member(rel->relids, &relid))
+		joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[0]);
+	else
+	{
+		relid = -1;
+		while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+			joinrel->notnull_attrs[relid] = bms_copy(rel->notnull_attrs[relid]);
+	}
+}
+
+/*
+ * set_joinrel_notnull_attrs
+ *
+ *	Set joinrel's notnull attrs infomation based on the both sides
+ * notnull_attrs info, join type, join quals.
+ */
+static void
+set_joinrel_notnull_attrs(RelOptInfo *joinrel,
+						  RelOptInfo *outer_rel,
+						  RelOptInfo *inner_rel,
+						  List *restrictlist,
+						  SpecialJoinInfo *sjinfo)
+{
+	if (sjinfo->jointype == JOIN_FULL)
+		/* Both sides are nullable. */
+		return;
+
+	/* If it is not FULL join, the outer side is not changed. */
+	copy_notnull_attrs_to_joinrel(joinrel, outer_rel);
+	switch(sjinfo->jointype)
+	{
+		case JOIN_ANTI:
+		case JOIN_SEMI:
+		case JOIN_INNER:
+			/*
+			 * No chances to add extra nullable Vars for ANTI/SEMI/INNER
+			 * join. So copy the inner_rel's notnull_attrs as well.
+			 */
+			copy_notnull_attrs_to_joinrel(joinrel, inner_rel);
+			break;
+		case JOIN_LEFT:
+			break;
+		default:
+			elog(ERROR, "Unexpected join type %d", sjinfo->jointype);
+	}
+
+	/* The join clauses may produce more not null vars. */
+	{
+		ListCell	*lc;
+		List *clauses = extract_actual_clauses(restrictlist, false);
+		Relids not_nullable_reilds = sjinfo->jointype == JOIN_LEFT ?
+				outer_rel->relids : joinrel->relids;
+
+		foreach(lc, find_nonnullable_vars((Node *) clauses))
+		{
+			Var *var = lfirst_node(Var, lc);
+			if (!bms_is_member(var->varno, not_nullable_reilds))
+			{
+				continue;
+			}
+			joinrel->notnull_attrs[var->varno] = bms_add_member(
+				joinrel->notnull_attrs[var->varno],
+				var->varattno - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+
+	/* Debug Only, will be removed at last. */
+	if (!enable_geqo)
+	{
+		int relid = -1;
+		int eLevel = INFO;
+		elog(eLevel, "Dump notnull for JoinRel(%s)", bmsToString(joinrel->relids));
+		while((relid = bms_next_member(joinrel->relids, relid)) >= 0)
+		{
+			Bitmapset *notnullattrs = joinrel->notnull_attrs[relid];
+			if (notnullattrs != NULL)
+				elog(eLevel, "FirstLowInvalidHeapAttributeNumber = %d, RELID = (%d), notnull_attrs: %s",
+					 FirstLowInvalidHeapAttributeNumber,
+					 relid,
+					 bmsToString(notnullattrs));
+		}
+	}
+}
+
+/*
+ * set_uppper_rel_notnull_attrs
+ *
+ *	Set notnull_attrs for upper relation.
+ *
+ *	Upper rel is impossible to generate new nullable vars.
+ * Just let upper_rel use the same notnull_attrs as the
+ * last joinrel.
+ *
+ * SetOp is an exception since we needs consider both sides.
+ */
+static void
+set_uppper_rel_notnull_attrs(PlannerInfo *root, RelOptInfo *upper_rel, UpperRelationKind kind)
+{
+	RelOptInfo *final_joinrel;
+	int relid;
+
+	if (kind == UPPERREL_SETOP)
+		/*
+		 * Need to handle specially for SETOP, both sides need considering
+		 * to set a SETOP rel.
+		 */
+		return;
+
+	if (root->initial_rels == NIL)
+		/* there is no initial rels at all, so impossible to have notnull vars. */
+		return;
+
+	if (root->join_rel_list == NIL)
+		final_joinrel = linitial_node(RelOptInfo, root->initial_rels);
+	else
+		final_joinrel = llast_node(RelOptInfo, root->join_rel_list);
+
+	upper_rel->notnull_attrs = final_joinrel->notnull_attrs;
+
+	/* Debug Only, should be removed from code review. */
+	if (!enable_geqo)
+	{
+		int eLevel = INFO;
+		elog(eLevel, "Dump notnull for UpperRel(%s) for kind %d",
+			 bmsToString(upper_rel->relids),
+			 kind);
+		if (bms_get_singleton_member(final_joinrel->relids, &relid))
+		{
+			Bitmapset *notnullattrs = upper_rel->notnull_attrs[0];
+			if (notnullattrs != NULL)
+				elog(eLevel, "kind = %d FirstLowInvalidHeapAttributeNumber = %d, RELID = (%d), notnull_attrs: %s",
+					 kind,
+					 FirstLowInvalidHeapAttributeNumber,
+					 relid,
+					 bmsToString(notnullattrs));
+		}
+		else
+		{
+			relid = -1;
+			while((relid = bms_next_member(final_joinrel->relids, relid)) >= 0)
+			{
+				Bitmapset *notnullattrs = upper_rel->notnull_attrs[relid];
+				if (notnullattrs != NULL)
+					elog(eLevel, "kind = %d FirstLowInvalidHeapAttributeNumber = %d, RELID = (%d), notnull_attrs: %s",
+						 kind,
+						 FirstLowInvalidHeapAttributeNumber,
+						 relid,
+						 bmsToString(notnullattrs));
+			}
+		}
+	}
+}
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 75b5ce1a8e4..e1d44e14246 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -94,6 +94,7 @@ extern bool bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b);
 extern int	bms_singleton_member(const Bitmapset *a);
 extern bool bms_get_singleton_member(const Bitmapset *a, int *member);
 extern int	bms_num_members(const Bitmapset *a);
+extern int	bms_max_member(const Bitmapset *a);
 
 /* optimized tests when we don't need to know exact membership count: */
 extern BMS_Membership bms_membership(const Bitmapset *a);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a6e5db4eecc..9bc729a9802 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -692,6 +692,20 @@ typedef struct RelOptInfo
 	/* default result targetlist for Paths scanning this relation */
 	struct PathTarget *reltarget;	/* list of Vars/Exprs, cost, width */
 
+	/*
+	 * Record which Var is not nullable for current RelOptInfo
+	 * after all the quals on this RelOptInfo are executed.
+	 *
+	 * For a joinrel, the length of notnull_attrs array equals the
+	 * max value in relids, the array index is varno and the value
+	 * is a Bitmapset * which presents which varattnos are not
+	 * nullable for the varno.
+	 *
+	 * For baserel, to save space, notnull_attrs[0] is always used
+	 * for current RelOptInfo.
+	 */
+	Bitmapset	**notnull_attrs;
+
 	/* materialization information */
 	List	   *pathlist;		/* Path structures */
 	List	   *ppilist;		/* ParamPathInfos used in pathlist */
diff --git a/src/test/regress/expected/notnulltest.out b/src/test/regress/expected/notnulltest.out
new file mode 100644
index 00000000000..cacae16edff
--- /dev/null
+++ b/src/test/regress/expected/notnulltest.out
@@ -0,0 +1,283 @@
+set geqo to off;
+create table t1(a int, b int not null, c int, d int);
+create table t2(a int, b int not null, c int, d int);
+-- single rel
+select * from t1;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select * from t1 where a > 1;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select * from t2 where a > 1 or c > 1;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+-- partitioned relation.
+-- append rel: all the childrel are not nullable.
+create table p (a int, b int, c int not null) partition by range(a);
+create table p_1 partition of p for values from (0) to (10000) partition by list(b);
+create table p_1_1(b int,  c int not null, a int);
+alter table p_1 attach partition p_1_1 for values in (1);
+create table p_2 partition of p for values from (10001) to (20000);
+-- p(1)  - 3(c)
+-- p_1(2) - 3(c)
+-- p_1_1(3) - 2(c)
+select * from p;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(3), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 10)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(4), notnull_attrs = (b 10)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 10)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 10)
+ a | b | c 
+---+---+---
+(0 rows)
+
+-- p(1)  - 3(c) 1(a)
+-- p_1(2) - 3(c) 1(a)
+-- p_1_1(3) - 2(c) 3(a)
+select * from p where a > 1;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(3), notnull_attrs = (b 9 10)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 8 10)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(4), notnull_attrs = (b 8 10)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 8 10)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 10)
+ a | b | c 
+---+---+---
+(0 rows)
+
+-- test join:
+-- t1: b
+-- t2: b
+-- t{1, 2}:  t1.a, t1.b t2.b t2.c
+select * from t1, t2 where t1.a = t2.c;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 9)
+INFO:  Dump notnull for JoinRel((b 1 2))
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 9 10)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 9 10)
+ a | b | c | d | a | b | c | d 
+---+---+---+---+---+---+---+---
+(0 rows)
+
+-- t1: b
+-- t2: b
+-- t{1, 2}:  none due to full join
+select * from t1 full join t2 on t1.a = t2.a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+ a | b | c | d | a | b | c | d 
+---+---+---+---+---+---+---+---
+(0 rows)
+
+-- t1: b
+-- t2: b
+-- t{1, 2}:  t1.b t1.a   (t2.a t2.b is nullable due to outer join)
+select * from t1 left join t2 on t1.a = t2.a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 9)
+INFO:  Dump notnull for JoinRel((b 1 2))
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+ a | b | c | d | a | b | c | d 
+---+---+---+---+---+---+---+---
+(0 rows)
+
+-- test upper rel.
+select * from t1 order by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select * from t1 where a > 1 order by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select * from t2 where a > 1 or c > 1 order by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select a, count(*) from t1 group by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 2
+INFO:  kind = 2 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | count 
+---+-------
+(0 rows)
+
+select a, count(*) from t1 where a > 1 group by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 2
+INFO:  kind = 2 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+ a | count 
+---+-------
+(0 rows)
+
+select a, count(*) from t2 where a > 1 or c > 1 group by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 2
+INFO:  kind = 2 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | count 
+---+-------
+(0 rows)
+
+select DISTINCT * from t1 order by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 5
+INFO:  kind = 5 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select DISTINCT * from t1 where a > 1 order by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 5
+INFO:  kind = 5 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select DISTINCT * from t2 where a > 1 or c > 1 order by a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 5
+INFO:  kind = 5 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select * from t1 left join t2 on t1.a = t2.a order by t2.a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 9)
+INFO:  Dump notnull for JoinRel((b 1 2))
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+ a | b | c | d | a | b | c | d 
+---+---+---+---+---+---+---+---
+(0 rows)
+
+select * from t1 join t2 on t1.a = t2.a order by t2.a;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 9)
+INFO:  Dump notnull for JoinRel((b 1 2))
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 8 9)
+ a | b | c | d | a | b | c | d 
+---+---+---+---+---+---+---+---
+(0 rows)
+
+--  subquery
+select * from
+(select t1.a as t1a, t1.b as t1b, t1.c as t1c,
+ t2.* from t1 join t2 on t1.a = t2.a order by t2.a limit 3) as x
+where t1c > 3;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(2), notnull_attrs = (b 9)
+INFO:  Dump notnull for JoinRel((b 1 2))
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 6
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  kind = 6 FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 8 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9)
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (2), notnull_attrs: (b 8 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 8 9 10)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 8 9 10)
+ t1a | t1b | t1c | a | b | c | d 
+-----+-----+-----+---+---+---+---
+(0 rows)
+
+-- SetOp RelOptInfo.
+-- simple union all
+select * from t1
+union all
+select * from t2;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(4), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(5), notnull_attrs = (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+select * from t1
+union
+select * from t2;
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+INFO:  FirstLowInvalidHeapAttributeNumber = -7, BaseRel(1), notnull_attrs = (b 9)
+INFO:  Dump notnull for UpperRel((b)) for kind 7
+INFO:  kind = 7 FirstLowInvalidHeapAttributeNumber = -7, RELID = (1), notnull_attrs: (b 9)
+ a | b | c | d 
+---+---+---+---
+(0 rows)
+
+drop table t1;
+drop table t2;
+drop table p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 103e11483d2..f1493cdb66b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -135,3 +135,5 @@ test: event_trigger oidjoins
 
 # this test also uses event triggers, so likewise run it by itself
 test: fast_default
+
+test: notnulltest
diff --git a/src/test/regress/sql/notnulltest.sql b/src/test/regress/sql/notnulltest.sql
new file mode 100644
index 00000000000..e6c127b7355
--- /dev/null
+++ b/src/test/regress/sql/notnulltest.sql
@@ -0,0 +1,87 @@
+set geqo to off;
+create table t1(a int, b int not null, c int, d int);
+create table t2(a int, b int not null, c int, d int);
+
+-- single rel
+select * from t1;
+select * from t1 where a > 1;
+select * from t2 where a > 1 or c > 1;
+
+-- partitioned relation.
+-- append rel: all the childrel are not nullable.
+create table p (a int, b int, c int not null) partition by range(a);
+create table p_1 partition of p for values from (0) to (10000) partition by list(b);
+create table p_1_1(b int,  c int not null, a int);
+alter table p_1 attach partition p_1_1 for values in (1);
+create table p_2 partition of p for values from (10001) to (20000);
+
+-- p(1)  - 3(c)
+-- p_1(2) - 3(c)
+-- p_1_1(3) - 2(c)
+select * from p;
+-- p(1)  - 3(c) 1(a)
+-- p_1(2) - 3(c) 1(a)
+-- p_1_1(3) - 2(c) 3(a)
+select * from p where a > 1;
+
+-- test join:
+-- t1: b
+-- t2: b
+-- t{1, 2}:  t1.a, t1.b t2.b t2.c
+select * from t1, t2 where t1.a = t2.c;
+
+-- t1: b
+-- t2: b
+-- t{1, 2}:  none due to full join
+select * from t1 full join t2 on t1.a = t2.a;
+
+-- t1: b
+-- t2: b
+-- t{1, 2}:  t1.b t1.a   (t2.a t2.b is nullable due to outer join)
+select * from t1 left join t2 on t1.a = t2.a;
+
+
+-- test upper rel.
+select * from t1 order by a;
+select * from t1 where a > 1 order by a;
+select * from t2 where a > 1 or c > 1 order by a;
+
+select a, count(*) from t1 group by a;
+select a, count(*) from t1 where a > 1 group by a;
+select a, count(*) from t2 where a > 1 or c > 1 group by a;
+
+select DISTINCT * from t1 order by a;
+select DISTINCT * from t1 where a > 1 order by a;
+select DISTINCT * from t2 where a > 1 or c > 1 order by a;
+
+select * from t1 left join t2 on t1.a = t2.a order by t2.a;
+
+select * from t1 join t2 on t1.a = t2.a order by t2.a;
+
+--  subquery
+
+select * from
+(select t1.a as t1a, t1.b as t1b, t1.c as t1c,
+ t2.* from t1 join t2 on t1.a = t2.a order by t2.a limit 3) as x
+where t1c > 3;
+
+
+-- SetOp RelOptInfo.
+
+
+-- simple union all
+
+select * from t1
+union all
+select * from t2;
+
+select * from t1
+union
+select * from t2;
+
+
+
+
+drop table t1;
+drop table t2;
+drop table p;
-- 
2.21.0

Andy Fan

zhihui.fan1213@gmail.com

over 3 years ago

In reply to: Ashutosh Bapat (#2)

Re: Tracking notnull attributes inside Var

Hi Ashutosh:

Nice to see you again!

On Tue, May 17, 2022 at 8:50 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

On Sun, May 15, 2022 at 8:41 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

The var in RelOptInfo->reltarget should have nullable = 0 but the var in
RelOptInfo->baserestrictinfo should have nullable = 1; The beauty of

this

are: a). It can distinguish the two situations perfectly b). Whenever we

want

to know the nullable attribute of a Var for an expression, it is super

easy to

know. In summary, we need to maintain the nullable attribute at 2

different

places. one is the before the filters are executed(baserestrictinfo,

joininfo,

ec_list at least). one is after the filters are executed

(RelOptInfo.reltarget

only?)

Thanks for identifying this. What you have written makes sense and it
might open a few optimization opportunities. But let me put down some
other thoughts here. You might want to take those into consideration
when designing your solution.

Thanks.

Do we want to just track nullable and non-nullable. May be we want
expand this class to nullable (var may be null), non-nullable (Var is
definitely non-NULL), null (Var will be always NULL).

Currently it doesn't support "Var will be always NULL" . Do you have any
use cases for this? and I can't think of too many cases where we can get
such information except something like "SELECT a FROM t WHERE a
IS NULL".

But the other way to look at this is along the lines of equivalence

classes. Equivalence classes record the expressions which are equal in
the final result of the query. The equivalence class members are not
equal at all the stages of query execution. But because they are
equal in the final result, we can impose that restriction on the lower
levels as well. Can we think of nullable in that fashion? If a Var is
non-nullable in the final result, we can impose that restriction on
the intermediate stages since rows with NULL values for that Var will
be filtered out somewhere. Similarly we could argue for null Var. But
knowledge that a Var is nullable in the final result does not impose a
NULL, non-NULL restriction on the intermediate stages. If we follow
this thought process, we don't need to differentiate Var at different
stages in query.

I agree this is an option. If so we need to track it under the PlannerInfo
struct but it would not be as fine-grained as my previous. Without
intermediate information, We can't know if a UnqiueKey contains multiple
NULLs, this would not be an issue for the "MARK Distinct as no-op" case,
but I'm not sure it is OK for other UniqueKey user cases. So my current
idea
is I still prefer to maintain the intermediate information, unless we are
sure it
costs too much or it is too complex to implement which I don't think so for
now
at least. So if you have time to look at the attached patch, that would be
super
great as well.

--
Best Regards
Andy Fan

Andy Fan

zhihui.fan1213@gmail.com

over 3 years ago

In reply to: Tom Lane (#3)

1 attachment(s)

Re: Tracking notnull attributes inside Var

I thought about the strategy below in the past few days, and think it
is better because it uses less cycles to get the same answer. IIUC, the
related structs should be created during / after deconstruct_jointree rather
than join_search_xx stage.

The schemes I've been toying with tend to look more like putting a
PlaceHolderVar-ish wrapper around the Var or expression that represents
the below-the-join value. The wrapper node could carry any join ID
info that we find necessary.

Just to confirm my understanding, the wrapper node should answer some
questions like this.

/*
* rel_is_nullable_side
*
* For the given join ID joinrelids, return if the relid is in the nullable
* side.
*/
static bool
rel_is_nullable_side(PlannerInfo *root, Relids joinrelids, Index relid)
{
Assert(bms_is_member(relid, joinrelids));
...
}

The thing that I'm kind of stalled on is
how to define this struct so that it's not a big headache for join
strength reduction (which could remove the need for a wrapper altogether)

or outer-join reordering (which makes it a bit harder to define which

join we think is the one nulling the value).

I think about the outer-join reorder case, can we just rely on
SpecialJoinInfo.min_lefthands & min_righthands to get the answer?
The attached patch is based on that. and I did some test in the patch
as well, looks the answer is correct.

What's more, if the above is correct and the calls of rel_is_nullable_side
is small, do we still need think about more effective data struct?

Thanks!

--
Best Regards
Andy Fan

Attachments:

not_null_based_on_special_joininfo.diffapplication/octet-stream; name=not_null_based_on_special_joininfo.diffDownload

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9ab..96e95a04ea9 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -52,7 +52,7 @@ static void compute_partition_bounds(PlannerInfo *root, RelOptInfo *rel1,
 static void get_matching_part_pairs(PlannerInfo *root, RelOptInfo *joinrel,
 									RelOptInfo *rel1, RelOptInfo *rel2,
 									List **parts1, List **parts2);
-
+static bool rel_is_nullable_side(PlannerInfo *root, Relids joinrelids, Index relid);
 
 /*
  * join_search_one_level
@@ -746,6 +746,25 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
 							 &restrictlist);
 
+	/* test code. */
+
+	if (sjinfo->jointype == JOIN_LEFT)
+	{
+		int relid = -1;
+		while((relid = bms_next_member(rel2->relids, relid)) >= 0)
+		{
+			Assert(rel_is_nullable_side(root, joinrelids, relid));
+		}
+	}
+	else if (sjinfo->jointype == JOIN_FULL)
+	{
+		int relid = -1;
+		while((relid = bms_next_member(joinrelids, relid)) >= 0)
+		{
+			Assert(rel_is_nullable_side(root, joinrelids, relid));
+		}
+	}
+
 	/*
 	 * If we've already proven this join is empty, we needn't consider any
 	 * more paths for it.
@@ -1781,3 +1800,69 @@ get_matching_part_pairs(PlannerInfo *root, RelOptInfo *joinrel,
 		*parts2 = lappend(*parts2, child_rel2);
 	}
 }
+
+
+/*
+ * rel_in_nullable_side
+ *
+ *	After the SpecialJoinInfo has been applied, return if relid
+ * is in the nullable side.
+ */
+static bool
+rel_in_nullable_side(SpecialJoinInfo *sjinfo, Index relid)
+{
+	if (sjinfo->jointype == JOIN_FULL)
+	{
+		if (bms_is_member(relid, sjinfo->min_lefthand) ||
+			bms_is_member(relid, sjinfo->min_righthand))
+			return true;
+	}
+	else if (sjinfo->jointype == JOIN_LEFT)
+	{
+		if (bms_is_member(relid, sjinfo->min_righthand))
+			return true;
+	}
+
+	return false;
+}
+
+
+/*
+ * rel_is_nullable_side
+ *
+ *	For the given join ID joinrelids, return if the relid is in the nullable
+ * side.
+ */
+static bool
+rel_is_nullable_side(PlannerInfo *root, Relids joinrelids, Index relid)
+{
+	ListCell *lc;
+	Assert(bms_is_member(relid, joinrelids));
+
+	/* Find the SepcialJoinInfo which is possible nullable the relid. */
+	foreach(lc, root->join_info_list)
+	{
+		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+
+		/*
+		 * If the sjinfo is possible takes effects on joinrelis, then it must
+		 * contain both sides of min relids.
+		 */
+		Relids min_relids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+
+		if (!bms_is_subset(min_relids, joinrelids))
+		{
+			bms_free(min_relids);
+			continue;
+		}
+
+		if (rel_in_nullable_side(sjinfo, relid))
+		{
+			bms_free(min_relids);
+			return true;
+		}
+		bms_free(min_relids);
+	}
+
+	return false;
+}