Reduce "Var IS [NOT] NULL" quals during constant folding

Started by Richard Guo10 months ago56 messages

guofenglinux@gmail.com

10 months ago

1 attachment(s)

Currently, we have an optimization that reduces an IS [NOT] NULL qual
on a NOT NULL column to constant true or constant false, provided we
can prove that the input expression of the NullTest is not nullable by
any outer joins. This deduction happens pretty late in planner,
during the distribution of quals to relations in query_planner(). As
mentioned in [1]/messages/by-id/2323997.1740623184@sss.pgh.pa.us, doing it at such a late stage has some drawbacks.

Ideally, this deduction should happen during constant folding.
However, we don't have the per-relation information about which
columns are defined as NOT NULL available at that point. That
information is collected from catalogs when building RelOptInfos for
base or other relations.

I'm wondering whether we can collect that information while building
the RangeTblEntry for a base or other relation, so that it's available
before constant folding. This could also enable other optimizations,
such as checking if a NOT IN subquery's output columns and its
left-hand expressions are all certainly not NULL, in which case we can
convert it to an anti-join.

Attached is a draft patch to reduce NullTest on a NOT NULL column in
eval_const_expressions.

Initially, I planned to get rid of restriction_is_always_true and
restriction_is_always_false altogether, since we now perform the
reduction of "Var IS [NOT] NULL" quals in eval_const_expressions.
However, removing them would prevent us from reducing some IS [NOT]
NULL quals that we were previously able to reduce, because (a) the
self-join elimination may introduce new IS NOT NULL quals after the
constant folding, and (b) if some outer joins are converted to inner
joins, previously irreducible NullTest quals may become reducible.

So I think maybe we'd better keep restriction_is_always_true and
restriction_is_always_false as-is.

Any thoughts?

[1]: /messages/by-id/2323997.1740623184@sss.pgh.pa.us

Thanks
Richard

Attachments:

v1-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patchapplication/octet-stream; name=v1-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patchDownload

From ab04f6d91498c85dba26ea7bd2b84526e22ba99b Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 19 Mar 2025 16:16:12 +0900
Subject: [PATCH v1] Reduce "Var IS [NOT] NULL" quals during constant folding

---
 src/backend/optimizer/plan/initsplan.c        | 26 +------
 src/backend/optimizer/util/clauses.c          | 69 ++++++++++++++++++-
 src/backend/optimizer/util/inherit.c          | 29 +++++---
 src/backend/optimizer/util/plancat.c          | 30 --------
 src/backend/optimizer/util/relnode.c          |  3 -
 src/backend/parser/parse_relation.c           | 30 ++++++++
 src/include/nodes/parsenodes.h                |  5 ++
 src/include/nodes/pathnodes.h                 |  6 --
 src/include/optimizer/optimizer.h             |  2 +
 src/include/parser/parse_relation.h           |  1 +
 .../regress/expected/generated_virtual.out    |  6 +-
 src/test/regress/expected/join.out            |  6 +-
 src/test/regress/expected/predicate.out       |  6 +-
 13 files changed, 136 insertions(+), 83 deletions(-)

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 1d1aa27d450..fc6f4f2cef4 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -545,7 +545,7 @@ remove_useless_groupby_columns(PlannerInfo *root)
 				 */
 				if (!index->nullsnotdistinct &&
 					!bms_is_member(index->indexkeys[i],
-								   rel->notnullattnums))
+								   rte->notnullattnums))
 				{
 					nulls_check_ok = false;
 					break;
@@ -3048,36 +3048,16 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
  * expr_is_nonnullable
  *	  Check to see if the Expr cannot be NULL
  *
- * If the Expr is a simple Var that is defined NOT NULL and meanwhile is not
- * nulled by any outer joins, then we can know that it cannot be NULL.
+ * Currently we only support simple Vars.
  */
 static bool
 expr_is_nonnullable(PlannerInfo *root, Expr *expr)
 {
-	RelOptInfo *rel;
-	Var		   *var;
-
 	/* For now only check simple Vars */
 	if (!IsA(expr, Var))
 		return false;
 
-	var = (Var *) expr;
-
-	/* could the Var be nulled by any outer joins? */
-	if (!bms_is_empty(var->varnullingrels))
-		return false;
-
-	/* system columns cannot be NULL */
-	if (var->varattno < 0)
-		return true;
-
-	/* is the column defined NOT NULL? */
-	rel = find_base_rel(root, var->varno);
-	if (var->varattno > 0 &&
-		bms_is_member(var->varattno, rel->notnullattnums))
-		return true;
-
-	return false;
+	return var_is_nonnullable(root, (Var *) expr);
 }
 
 /*
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 43dfecfb47f..196407d423b 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -41,6 +42,7 @@
 #include "parser/analyze.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
@@ -2240,7 +2242,8 @@ rowtype_field_matches(Oid rowtypeid, int fieldnum,
  * only operators and functions that are reasonable to try to execute.
  *
  * NOTE: "root" can be passed as NULL if the caller never wants to do any
- * Param substitutions nor receive info about inlined functions.
+ * Param substitutions nor receive info about inlined functions nor reduce
+ * NullTest for Vars to constant true or constant false.
  *
  * NOTE: the planner assumes that this will always flatten nested AND and
  * OR clauses into N-argument form.  See comments in prepqual.c.
@@ -3535,6 +3538,31 @@ eval_const_expressions_mutator(Node *node,
 
 					return makeBoolConst(result, false);
 				}
+				if (!ntest->argisrow && arg && IsA(arg, Var) && context->root)
+				{
+					Var		   *varg = (Var *) arg;
+					bool		result;
+
+					if (var_is_nonnullable(context->root, varg))
+					{
+						switch (ntest->nulltesttype)
+						{
+							case IS_NULL:
+								result = false;
+								break;
+							case IS_NOT_NULL:
+								result = true;
+								break;
+							default:
+								elog(ERROR, "unrecognized nulltesttype: %d",
+									 (int) ntest->nulltesttype);
+								result = false; /* keep compiler quiet */
+								break;
+						}
+
+						return makeBoolConst(result, false);
+					}
+				}
 
 				newntest = makeNode(NullTest);
 				newntest->arg = (Expr *) arg;
@@ -4153,6 +4181,45 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 	return newexpr;
 }
 
+/*
+ * var_is_nonnullable
+ *	  Check to see if the Var cannot be NULL
+ *
+ * If the Var is defined NOT NULL and meanwhile is not nulled by any outer
+ * joins or grouping sets, then we can know that it cannot be NULL.
+ */
+bool
+var_is_nonnullable(PlannerInfo *root, Var *var)
+{
+	RangeTblEntry *rte;
+
+	Assert(IsA(var, Var));
+
+	/* could the Var be nulled by any outer joins or grouping sets? */
+	if (!bms_is_empty(var->varnullingrels))
+		return false;
+
+	/* system columns cannot be NULL */
+	if (var->varattno < 0)
+		return true;
+
+	/*
+	 * Check if the Var is defined as NOT NULL.  We must skip inheritance
+	 * parent tables, as some child tables may have a NOT NULL constraint for
+	 * a column while others may not.  This cannot happen with partitioned
+	 * tables, though.
+	 */
+	rte = planner_rt_fetch(var->varno, root);
+	if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
+		return false;
+
+	if (var->varattno > 0 &&
+		bms_is_member(var->varattno, rte->notnullattnums))
+		return true;
+
+	return false;
+}
+
 /*
  * expand_function_arguments: convert named-notation args to positional args
  * and/or insert default args, as needed
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 17e51cd75d7..dda72d8adf8 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -466,8 +466,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 								Index *childRTindex_p)
 {
 	Query	   *parse = root->parse;
-	Oid			parentOID PG_USED_FOR_ASSERTS_ONLY =
-		RelationGetRelid(parentrel);
+	Oid			parentOID = RelationGetRelid(parentrel);
 	Oid			childOID = RelationGetRelid(childrel);
 	RangeTblEntry *childrte;
 	Index		childRTindex;
@@ -479,15 +478,16 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	/*
 	 * Build an RTE for the child, and attach to query's rangetable list. We
 	 * copy most scalar fields of the parent's RTE, but replace relation OID,
-	 * relkind, and inh for the child.  Set the child's securityQuals to
-	 * empty, because we only want to apply the parent's RLS conditions
-	 * regardless of what RLS properties individual children may have. (This
-	 * is an intentional choice to make inherited RLS work like regular
-	 * permissions checks.) The parent securityQuals will be propagated to
-	 * children along with other base restriction clauses, so we don't need to
-	 * do it here.  Other infrastructure of the parent RTE has to be
-	 * translated to match the child table's column ordering, which we do
-	 * below, so a "flat" copy is sufficient to start with.
+	 * relkind, and inh for the child.  We also replace notnullattnums for the
+	 * child if its relation OID is different from the parent's.  Set the
+	 * child's securityQuals to empty, because we only want to apply the
+	 * parent's RLS conditions regardless of what RLS properties individual
+	 * children may have. (This is an intentional choice to make inherited RLS
+	 * work like regular permissions checks.) The parent securityQuals will be
+	 * propagated to children along with other base restriction clauses, so we
+	 * don't need to do it here.  Other infrastructure of the parent RTE has
+	 * to be translated to match the child table's column ordering, which we
+	 * do below, so a "flat" copy is sufficient to start with.
 	 */
 	childrte = makeNode(RangeTblEntry);
 	memcpy(childrte, parentrte, sizeof(RangeTblEntry));
@@ -507,6 +507,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	/* No permission checking for child RTEs. */
 	childrte->perminfoindex = 0;
 
+	/* Record NOT NULL columns for the child if needed. */
+	if (childOID != parentOID)
+	{
+		childrte->notnullattnums = NULL;
+		getRelationAttrs(childrel, childrte);
+	}
+
 	/* Link not-yet-fully-filled child RTE into data structures */
 	parse->rtable = lappend(parse->rtable, childrte);
 	childRTindex = list_length(parse->rtable);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0489ad36644..4a44ff18a29 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -162,36 +162,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	rel->attr_widths = (int32 *)
 		palloc0((rel->max_attr - rel->min_attr + 1) * sizeof(int32));
 
-	/*
-	 * Record which columns are defined as NOT NULL.  We leave this
-	 * unpopulated for non-partitioned inheritance parent relations as it's
-	 * ambiguous as to what it means.  Some child tables may have a NOT NULL
-	 * constraint for a column while others may not.  We could work harder and
-	 * build a unioned set of all child relations notnullattnums, but there's
-	 * currently no need.  The RelOptInfo corresponding to the !inh
-	 * RangeTblEntry does get populated.
-	 */
-	if (!inhparent || relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		for (int i = 0; i < relation->rd_att->natts; i++)
-		{
-			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
-
-			if (attr->attnotnull)
-			{
-				rel->notnullattnums = bms_add_member(rel->notnullattnums,
-													 i + 1);
-
-				/*
-				 * Per RemoveAttributeById(), dropped columns will have their
-				 * attnotnull unset, so we needn't check for dropped columns
-				 * in the above condition.
-				 */
-				Assert(!attr->attisdropped);
-			}
-		}
-	}
-
 	/*
 	 * Estimate relation size --- unless it's an inheritance parent, in which
 	 * case the size we want is not the rel's own size but the size of its
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..6db5b1273ab 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -222,7 +222,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->relid = relid;
 	rel->rtekind = rte->rtekind;
 	/* min_attr, max_attr, attr_needed, attr_widths are set below */
-	rel->notnullattnums = NULL;
 	rel->lateral_vars = NIL;
 	rel->indexlist = NIL;
 	rel->statlist = NIL;
@@ -727,7 +726,6 @@ build_join_rel(PlannerInfo *root,
 	joinrel->max_attr = 0;
 	joinrel->attr_needed = NULL;
 	joinrel->attr_widths = NULL;
-	joinrel->notnullattnums = NULL;
 	joinrel->nulling_relids = NULL;
 	joinrel->lateral_vars = NIL;
 	joinrel->lateral_referencers = NULL;
@@ -916,7 +914,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->max_attr = 0;
 	joinrel->attr_needed = NULL;
 	joinrel->attr_widths = NULL;
-	joinrel->notnullattnums = NULL;
 	joinrel->nulling_relids = NULL;
 	joinrel->lateral_vars = NIL;
 	joinrel->lateral_referencers = NULL;
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 04ecf64b1fc..7f719e8e398 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1520,6 +1520,7 @@ addRangeTableEntry(ParseState *pstate,
 	rte->inh = inh;
 	rte->relkind = rel->rd_rel->relkind;
 	rte->rellockmode = lockmode;
+	getRelationAttrs(rel, rte);
 
 	/*
 	 * Build the list of effective column names using user-supplied aliases
@@ -1605,6 +1606,7 @@ addRangeTableEntryForRelation(ParseState *pstate,
 	rte->inh = inh;
 	rte->relkind = rel->rd_rel->relkind;
 	rte->rellockmode = lockmode;
+	getRelationAttrs(rel, rte);
 
 	/*
 	 * Build the list of effective column names using user-supplied aliases
@@ -4022,3 +4024,31 @@ getRTEPermissionInfo(List *rteperminfos, RangeTblEntry *rte)
 
 	return perminfo;
 }
+
+/*
+ * getRelationAttrs
+ *		Record which columns of the given relation are defined as NOT NULL.
+ */
+void
+getRelationAttrs(Relation relation, RangeTblEntry *rte)
+{
+	Assert(rte->rtekind == RTE_RELATION);
+
+	for (int i = 0; i < relation->rd_att->natts; i++)
+	{
+		CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
+
+		if (attr->attnotnull)
+		{
+			rte->notnullattnums = bms_add_member(rte->notnullattnums,
+												 i + 1);
+
+			/*
+			 * Per RemoveAttributeById(), dropped columns will have their
+			 * attnotnull unset, so we needn't check for dropped columns in
+			 * the above condition.
+			 */
+			Assert(!attr->attisdropped);
+		}
+	}
+}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..0ba80b792ef 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1080,6 +1080,9 @@ typedef struct RangeTblEntry
 	 * this RTE in the containing struct's list of same; 0 if permissions need
 	 * not be checked for this RTE.
 	 *
+	 * notnullattnums is zero-based set containing attnums of NOT NULL
+	 * columns.
+	 *
 	 * As a special case, relid, relkind, rellockmode, and perminfoindex can
 	 * also be set (nonzero) in an RTE_SUBQUERY RTE.  This occurs when we
 	 * convert an RTE_RELATION RTE naming a view into an RTE_SUBQUERY
@@ -1105,6 +1108,8 @@ typedef struct RangeTblEntry
 	Index		perminfoindex pg_node_attr(query_jumble_ignore);
 	/* sampling info, or NULL */
 	struct TableSampleClause *tablesample;
+	/* columns defined as NOT NULL */
+	Bitmapset  *notnullattnums;
 
 	/*
 	 * Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c24a1fc8514..11334d5bc1b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -955,12 +955,6 @@ typedef struct RelOptInfo
 	Relids	   *attr_needed pg_node_attr(read_write_ignore);
 	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths pg_node_attr(read_write_ignore);
-
-	/*
-	 * Zero-based set containing attnums of NOT NULL columns.  Not populated
-	 * for rels corresponding to non-partitioned inh==true RTEs.
-	 */
-	Bitmapset  *notnullattnums;
 	/* relids of outer joins that can null this baserel */
 	Relids		nulling_relids;
 	/* LATERAL Vars and PHVs referenced by rel */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 78e05d88c8e..748556c9163 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -154,6 +154,8 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 						   Oid result_collation);
 
+extern bool var_is_nonnullable(PlannerInfo *root, Var *var);
+
 extern List *expand_function_arguments(List *args, bool include_out_arguments,
 									   Oid result_type,
 									   struct HeapTupleData *func_tuple);
diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h
index d59599cf242..b7a55b92ef6 100644
--- a/src/include/parser/parse_relation.h
+++ b/src/include/parser/parse_relation.h
@@ -106,6 +106,7 @@ extern RTEPermissionInfo *addRTEPermissionInfo(List **rteperminfos,
 											   RangeTblEntry *rte);
 extern RTEPermissionInfo *getRTEPermissionInfo(List *rteperminfos,
 											   RangeTblEntry *rte);
+extern void getRelationAttrs(Relation relation, RangeTblEntry *rte);
 extern bool isLockedRefname(ParseState *pstate, const char *refname);
 extern void addNSItemToQuery(ParseState *pstate, ParseNamespaceItem *nsitem,
 							 bool addToJoinList,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index dc09c85938e..a93f25a417e 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1472,11 +1472,11 @@ where coalesce(t2.b, 1) = 2;
 explain (costs off)
 select t1.a from gtest32 t1 left join gtest32 t2 on t1.a = t2.a
 where coalesce(t2.b, 1) = 2 or t1.a is null;
-                         QUERY PLAN                          
--------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Hash Left Join
    Hash Cond: (t1.a = t2.a)
-   Filter: ((COALESCE((t2.a * 2), 1) = 2) OR (t1.a IS NULL))
+   Filter: (COALESCE((t2.a * 2), 1) = 2)
    ->  Seq Scan on gtest32 t1
    ->  Hash
          ->  Seq Scan on gtest32 t2
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a57bb18c24f..69877e310f6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3639,8 +3639,8 @@ from nt3 as nt3
     ) as ss2
     on ss2.id = nt3.nt2_id
 where nt3.id = 1 and ss2.b3;
-                  QUERY PLAN                   
------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
@@ -3649,7 +3649,7 @@ where nt3.id = 1 and ss2.b3;
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
          Index Cond: (id = nt2.nt1_id)
-         Filter: (nt2.b1 AND (id IS NOT NULL))
+         Filter: (nt2.b1 AND true)
 (9 rows)
 
 select nt3.id
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index b79037748b7..e025c05261d 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -84,10 +84,10 @@ SELECT * FROM pred_tab t WHERE t.a IS NULL OR t.c IS NULL;
 -- are provably false
 EXPLAIN (COSTS OFF)
 SELECT * FROM pred_tab t WHERE t.b IS NULL OR t.c IS NULL;
-               QUERY PLAN               
-----------------------------------------
+       QUERY PLAN       
+------------------------
  Seq Scan on pred_tab t
-   Filter: ((b IS NULL) OR (c IS NULL))
+   Filter: (b IS NULL)
 (2 rows)
 
 --
-- 
2.43.0

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Richard Guo (#1)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Fri, Mar 21, 2025 at 6:14 PM Richard Guo <guofenglinux@gmail.com> wrote:

I'm wondering whether we can collect that information while building
the RangeTblEntry for a base or other relation, so that it's available
before constant folding. This could also enable other optimizations,
such as checking if a NOT IN subquery's output columns and its
left-hand expressions are all certainly not NULL, in which case we can
convert it to an anti-join.

Attached is a draft patch to reduce NullTest on a NOT NULL column in
eval_const_expressions.

FWIW, reducing "Var IS [NOT] NULL" quals during constant folding can
somewhat influence the decision on join ordering later. For instance,

create table t (a int not null, b int);

select * from t t1 left join
(t t2 left join t t3 on t2.a is not null)
on t1.b = t2.b;

For this query, "t2.a is not null" is reduced to true during constant
folding and then ignored, which leads to us being unable to commute
t1/t2 join with t2/t3 join.

OTOH, constant-folding NullTest for Vars may enable join orders that
were previously impossible. For instance,

select * from t t1 left join
(t t2 left join t t3 on t2.a is null or t2.b = t3.b)
on t1.b = t2.b;

Previously the t2/t3 join's clause is not strict for t2 due to the IS
NULL qual, which prevents t2/t3 join from commuting with t1/t2 join.
Now, the IS NULL qual is removed during constant folding, allowing us
to generate a plan with the join order (t1/t2)/t3.

Not quite sure if this is something we need to worry about.

Thanks
Richard

Robert Haas

robertmhaas@gmail.com

10 months ago

In reply to: Richard Guo (#2)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Fri, Mar 21, 2025 at 10:21 AM Richard Guo <guofenglinux@gmail.com> wrote:

Not quite sure if this is something we need to worry about.

I haven't really dug into this but I bet it's not that serious, in the
sense that we could probably work around it with more logic if we
really need to.

However, I'm a bit concerned about the overall premise of the patch
set. It feels like it is moving something that really ought to happen
at optimization time back to parse time. I have a feeling that's going
to break something, although I am not sure right now exactly what.
Wouldn't it be better to have this still happen in the planner, but
sooner than it does now?

--
Robert Haas
EDB: http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Robert Haas (#3)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Robert Haas <robertmhaas@gmail.com> writes:

However, I'm a bit concerned about the overall premise of the patch
set. It feels like it is moving something that really ought to happen
at optimization time back to parse time. I have a feeling that's going
to break something, although I am not sure right now exactly what.

Ugh, no, that is *completely* unworkable. Suppose that the user
does CREATE VIEW, and the parse tree recorded for that claims that
column X is not-nullable. Then the user drops the not-null
constraint, and then asks to execute the view. We'll optimize on
the basis of stale information.

The way to make this work is what I said before: move the planner's
collection of relation information to somewhere a bit earlier in
the planner. But not to outside the planner.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

10 months ago

In reply to: Tom Lane (#4)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Fri, Mar 21, 2025 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

However, I'm a bit concerned about the overall premise of the patch
set. It feels like it is moving something that really ought to happen
at optimization time back to parse time. I have a feeling that's going
to break something, although I am not sure right now exactly what.

Ugh, no, that is *completely* unworkable. Suppose that the user
does CREATE VIEW, and the parse tree recorded for that claims that
column X is not-nullable. Then the user drops the not-null
constraint, and then asks to execute the view. We'll optimize on
the basis of stale information.

The way to make this work is what I said before: move the planner's
collection of relation information to somewhere a bit earlier in
the planner. But not to outside the planner.

Reading this reminded me of the existing issue in [1]/messages/by-id/4xxau766dofbwugeyvjftra3g5f7ifaal2clgrbpr7jqotr4av@d3ige2krpoza where we've broken
session isolation of temporary relation data. There it feels like we are
making decisions in the parser that really belong in the planner since
catalog data is needed to determine relpersistence in many cases. If we
are looking for a spot "earlier in the planner" to attach relation
information, figuring out how to use that to improve matters related to
relpersistence seems warranted.

David J.

[1]: /messages/by-id/4xxau766dofbwugeyvjftra3g5f7ifaal2clgrbpr7jqotr4av@d3ige2krpoza
/messages/by-id/4xxau766dofbwugeyvjftra3g5f7ifaal2clgrbpr7jqotr4av@d3ige2krpoza

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Robert Haas (#3)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Sat, Mar 22, 2025 at 1:12 AM Robert Haas <robertmhaas@gmail.com> wrote:

However, I'm a bit concerned about the overall premise of the patch
set. It feels like it is moving something that really ought to happen
at optimization time back to parse time. I have a feeling that's going
to break something, although I am not sure right now exactly what.
Wouldn't it be better to have this still happen in the planner, but
sooner than it does now?

You're right. It's just flat wrong to collect catalog information in
the parser and use it in the planner. As Tom pointed out, the catalog
information could change in between, which would cause us to use stale
data.

Yeah, this should still happen in the planner, perhaps before
pull_up_sublinks, if we plan to leverage that info to convert NOT IN
to anti-join.

Thanks
Richard

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Tom Lane (#4)

1 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Sat, Mar 22, 2025 at 2:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ugh, no, that is *completely* unworkable. Suppose that the user
does CREATE VIEW, and the parse tree recorded for that claims that
column X is not-nullable. Then the user drops the not-null
constraint, and then asks to execute the view. We'll optimize on
the basis of stale information.

Thanks for pointing this out.

The way to make this work is what I said before: move the planner's
collection of relation information to somewhere a bit earlier in
the planner. But not to outside the planner.

I'm considering moving the collection of attnotnull information before
pull_up_sublinks, in hopes of leveraging this info to pull up NOT IN
in the future, something like attached.

Maybe we could also collect the attgenerated information in the same
routine, making life easier for expand_virtual_generated_columns.

Another issue I found is that in convert_EXISTS_to_ANY, we pass the
parent's root to eval_const_expressions, which can cause problems when
reducing "Var IS [NOT] NULL" quals. To fix, v2 patch constructs up a
dummy PlannerInfo with "glob" link set to the parent's and "parser"
link set to the subquery. I believe these are the only fields used.

Thanks
Richard

Attachments:

v2-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patchapplication/octet-stream; name=v2-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patchDownload

From 44db6b0dd66380cf0957304d0bc4369b7939a956 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 19 Mar 2025 16:16:12 +0900
Subject: [PATCH v2] Reduce "Var IS [NOT] NULL" quals during constant folding

---
 .../postgres_fdw/expected/postgres_fdw.out    |  8 +--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  2 +-
 src/backend/optimizer/plan/initsplan.c        | 26 +------
 src/backend/optimizer/plan/planner.c          |  9 +++
 src/backend/optimizer/plan/subselect.c        | 21 ++++--
 src/backend/optimizer/prep/prepjointree.c     | 41 +++++++++++
 src/backend/optimizer/util/clauses.c          | 72 ++++++++++++++++++-
 src/backend/optimizer/util/inherit.c          | 26 ++++---
 src/backend/optimizer/util/plancat.c          | 63 ++++++++--------
 src/backend/optimizer/util/relnode.c          |  3 -
 src/include/nodes/parsenodes.h                |  5 ++
 src/include/nodes/pathnodes.h                 |  6 --
 src/include/optimizer/optimizer.h             |  2 +
 src/include/optimizer/plancat.h               |  2 +
 src/include/optimizer/prep.h                  |  1 +
 .../regress/expected/generated_virtual.out    |  6 +-
 src/test/regress/expected/join.out            |  6 +-
 src/test/regress/expected/predicate.out       |  6 +-
 18 files changed, 211 insertions(+), 94 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb4ed3059c4..ac21ebc5431 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -699,12 +699,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- Op
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
-                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
+                                                              QUERY PLAN                                                              
+--------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index d45e9f8ab52..e7c304bb421 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -344,7 +344,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;        -- Nu
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- SubscriptingRef
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 1d1aa27d450..fc6f4f2cef4 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -545,7 +545,7 @@ remove_useless_groupby_columns(PlannerInfo *root)
 				 */
 				if (!index->nullsnotdistinct &&
 					!bms_is_member(index->indexkeys[i],
-								   rel->notnullattnums))
+								   rte->notnullattnums))
 				{
 					nulls_check_ok = false;
 					break;
@@ -3048,36 +3048,16 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
  * expr_is_nonnullable
  *	  Check to see if the Expr cannot be NULL
  *
- * If the Expr is a simple Var that is defined NOT NULL and meanwhile is not
- * nulled by any outer joins, then we can know that it cannot be NULL.
+ * Currently we only support simple Vars.
  */
 static bool
 expr_is_nonnullable(PlannerInfo *root, Expr *expr)
 {
-	RelOptInfo *rel;
-	Var		   *var;
-
 	/* For now only check simple Vars */
 	if (!IsA(expr, Var))
 		return false;
 
-	var = (Var *) expr;
-
-	/* could the Var be nulled by any outer joins? */
-	if (!bms_is_empty(var->varnullingrels))
-		return false;
-
-	/* system columns cannot be NULL */
-	if (var->varattno < 0)
-		return true;
-
-	/* is the column defined NOT NULL? */
-	rel = find_base_rel(root, var->varno);
-	if (var->varattno > 0 &&
-		bms_is_member(var->varattno, rel->notnullattnums))
-		return true;
-
-	return false;
+	return var_is_nonnullable(root, (Var *) expr);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 141177e7413..01855dee285 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -719,6 +719,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	replace_empty_jointree(parse);
 
+	/*
+	 * Scan the query's rangetable for ordinary relations and retrieve
+	 * attribute information from the system catalogs for each of them.  Note
+	 * that this step does not descend into SubLinks and subqueries; if we
+	 * pull up any SubLinks or subqueries below, their rangetables are scanned
+	 * just before pulling them up.
+	 */
+	collect_relation_attrs(parse);
+
 	/*
 	 * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
 	 * to transform them into joins.  Note that this step does not descend
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 8230cbea3c3..c89a0562500 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1440,6 +1440,12 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	 */
 	replace_empty_jointree(subselect);
 
+	/*
+	 * Scan the subquery's rangetable for ordinary relations and retrieve
+	 * attribute information from the system catalogs for each of them.
+	 */
+	collect_relation_attrs(subselect);
+
 	/*
 	 * Prepare to pull up the sub-select into top range table.
 	 *
@@ -1652,6 +1658,7 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 					  Node **testexpr, List **paramIds)
 {
 	Node	   *whereClause;
+	PlannerInfo subroot;
 	List	   *leftargs,
 			   *rightargs,
 			   *opids,
@@ -1711,12 +1718,14 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 	 * parent aliases were flattened already, and we're not going to pull any
 	 * child Vars (of any description) into the parent.
 	 *
-	 * Note: passing the parent's root to eval_const_expressions is
-	 * technically wrong, but we can get away with it since only the
-	 * boundParams (if any) are used, and those would be the same in a
-	 * subroot.
-	 */
-	whereClause = eval_const_expressions(root, whereClause);
+	 * Note: we construct up an entirely dummy PlannerInfo to pass to
+	 * eval_const_expressions.  This is fine because only the "glob" and
+	 * "parse" links are used by eval_const_expressions.
+	 */
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	whereClause = eval_const_expressions(&subroot, whereClause);
 	whereClause = (Node *) canonicalize_qual((Expr *) whereClause, false);
 	whereClause = (Node *) make_ands_implicit((Expr *) whereClause);
 
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index d131a5bbc59..83d71065e3d 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -5,6 +5,7 @@
  *
  * NOTE: the intended sequence for invoking these operations is
  *		replace_empty_jointree
+ *		collect_relation_attrs
  *		pull_up_sublinks
  *		preprocess_function_rtes
  *		expand_virtual_generated_columns
@@ -36,6 +37,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
 #include "optimizer/tlist.h"
@@ -436,6 +438,39 @@ replace_empty_jointree(Query *parse)
 	parse->jointree->fromlist = list_make1(rtr);
 }
 
+/*
+ * collect_relation_attrs
+ *		Scan the query's rangetable for ordinary relations and retrieve
+ *		attribute information from the system catalogs for each of them.
+ */
+void
+collect_relation_attrs(Query *parse)
+{
+	ListCell   *lc;
+
+	foreach(lc, parse->rtable)
+	{
+		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+		Relation	rel;
+
+		/* We only collect attribute info for ordinary relations. */
+		if (rte->rtekind != RTE_RELATION)
+			continue;
+
+		/*
+		 * We need not lock the relation since it was already locked, either
+		 * by the rewriter or when expand_inherited_rtentry() added it to the
+		 * query's rangetable.
+		 */
+		rel = table_open(rte->relid, NoLock);
+
+		/* Record NOT NULL columns for this relation. */
+		get_relation_notnullatts(rel, rte);
+
+		table_close(rel, NoLock);
+	}
+}
+
 /*
  * pull_up_sublinks
  *		Attempt to pull up ANY and EXISTS SubLinks to be treated as
@@ -1327,6 +1362,12 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	 */
 	replace_empty_jointree(subquery);
 
+	/*
+	 * Scan the subquery's rangetable for ordinary relations and retrieve
+	 * attribute information from the system catalogs for each of them.
+	 */
+	collect_relation_attrs(subquery);
+
 	/*
 	 * Pull up any SubLinks within the subquery's quals, so that we don't
 	 * leave unoptimized SubLinks behind.
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 43dfecfb47f..258acdcaadf 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -41,6 +42,7 @@
 #include "parser/analyze.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
@@ -2240,7 +2242,8 @@ rowtype_field_matches(Oid rowtypeid, int fieldnum,
  * only operators and functions that are reasonable to try to execute.
  *
  * NOTE: "root" can be passed as NULL if the caller never wants to do any
- * Param substitutions nor receive info about inlined functions.
+ * Param substitutions nor receive info about inlined functions nor reduce
+ * NullTest for Vars to constant true or constant false.
  *
  * NOTE: the planner assumes that this will always flatten nested AND and
  * OR clauses into N-argument form.  See comments in prepqual.c.
@@ -3535,6 +3538,31 @@ eval_const_expressions_mutator(Node *node,
 
 					return makeBoolConst(result, false);
 				}
+				if (!ntest->argisrow && arg && IsA(arg, Var) && context->root)
+				{
+					Var		   *varg = (Var *) arg;
+					bool		result;
+
+					if (var_is_nonnullable(context->root, varg))
+					{
+						switch (ntest->nulltesttype)
+						{
+							case IS_NULL:
+								result = false;
+								break;
+							case IS_NOT_NULL:
+								result = true;
+								break;
+							default:
+								elog(ERROR, "unrecognized nulltesttype: %d",
+									 (int) ntest->nulltesttype);
+								result = false; /* keep compiler quiet */
+								break;
+						}
+
+						return makeBoolConst(result, false);
+					}
+				}
 
 				newntest = makeNode(NullTest);
 				newntest->arg = (Expr *) arg;
@@ -4153,6 +4181,48 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 	return newexpr;
 }
 
+/*
+ * var_is_nonnullable
+ *	  Check to see if the Var cannot be NULL
+ *
+ * If the Var is defined NOT NULL and meanwhile is not nulled by any outer
+ * joins or grouping sets, then we can know that it cannot be NULL.
+ */
+bool
+var_is_nonnullable(PlannerInfo *root, Var *var)
+{
+	RangeTblEntry *rte;
+
+	Assert(IsA(var, Var));
+
+	if (var->varlevelsup != 0)
+		return false;
+
+	/* could the Var be nulled by any outer joins or grouping sets? */
+	if (!bms_is_empty(var->varnullingrels))
+		return false;
+
+	/* system columns cannot be NULL */
+	if (var->varattno < 0)
+		return true;
+
+	/*
+	 * Check if the Var is defined as NOT NULL.  We must skip inheritance
+	 * parent tables, as some child tables may have a NOT NULL constraint for
+	 * a column while others may not.  This cannot happen with partitioned
+	 * tables, though.
+	 */
+	rte = planner_rt_fetch(var->varno, root);
+	if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
+		return false;
+
+	if (var->varattno > 0 &&
+		bms_is_member(var->varattno, rte->notnullattnums))
+		return true;
+
+	return false;
+}
+
 /*
  * expand_function_arguments: convert named-notation args to positional args
  * and/or insert default args, as needed
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 17e51cd75d7..448d487cd0b 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -466,8 +466,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 								Index *childRTindex_p)
 {
 	Query	   *parse = root->parse;
-	Oid			parentOID PG_USED_FOR_ASSERTS_ONLY =
-		RelationGetRelid(parentrel);
+	Oid			parentOID = RelationGetRelid(parentrel);
 	Oid			childOID = RelationGetRelid(childrel);
 	RangeTblEntry *childrte;
 	Index		childRTindex;
@@ -479,15 +478,16 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	/*
 	 * Build an RTE for the child, and attach to query's rangetable list. We
 	 * copy most scalar fields of the parent's RTE, but replace relation OID,
-	 * relkind, and inh for the child.  Set the child's securityQuals to
-	 * empty, because we only want to apply the parent's RLS conditions
-	 * regardless of what RLS properties individual children may have. (This
-	 * is an intentional choice to make inherited RLS work like regular
-	 * permissions checks.) The parent securityQuals will be propagated to
-	 * children along with other base restriction clauses, so we don't need to
-	 * do it here.  Other infrastructure of the parent RTE has to be
-	 * translated to match the child table's column ordering, which we do
-	 * below, so a "flat" copy is sufficient to start with.
+	 * relkind, and inh for the child.  We also replace notnullattnums for the
+	 * child if its relation OID is different from the parent's.  Set the
+	 * child's securityQuals to empty, because we only want to apply the
+	 * parent's RLS conditions regardless of what RLS properties individual
+	 * children may have. (This is an intentional choice to make inherited RLS
+	 * work like regular permissions checks.) The parent securityQuals will be
+	 * propagated to children along with other base restriction clauses, so we
+	 * don't need to do it here.  Other infrastructure of the parent RTE has
+	 * to be translated to match the child table's column ordering, which we
+	 * do below, so a "flat" copy is sufficient to start with.
 	 */
 	childrte = makeNode(RangeTblEntry);
 	memcpy(childrte, parentrte, sizeof(RangeTblEntry));
@@ -507,6 +507,10 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	/* No permission checking for child RTEs. */
 	childrte->perminfoindex = 0;
 
+	/* Record NOT NULL columns for the child if needed. */
+	if (childOID != parentOID)
+		get_relation_notnullatts(childrel, childrte);
+
 	/* Link not-yet-fully-filled child RTE into data structures */
 	parse->rtable = lappend(parse->rtable, childrte);
 	childRTindex = list_length(parse->rtable);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0489ad36644..956341c14d1 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -162,36 +162,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	rel->attr_widths = (int32 *)
 		palloc0((rel->max_attr - rel->min_attr + 1) * sizeof(int32));
 
-	/*
-	 * Record which columns are defined as NOT NULL.  We leave this
-	 * unpopulated for non-partitioned inheritance parent relations as it's
-	 * ambiguous as to what it means.  Some child tables may have a NOT NULL
-	 * constraint for a column while others may not.  We could work harder and
-	 * build a unioned set of all child relations notnullattnums, but there's
-	 * currently no need.  The RelOptInfo corresponding to the !inh
-	 * RangeTblEntry does get populated.
-	 */
-	if (!inhparent || relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		for (int i = 0; i < relation->rd_att->natts; i++)
-		{
-			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
-
-			if (attr->attnotnull)
-			{
-				rel->notnullattnums = bms_add_member(rel->notnullattnums,
-													 i + 1);
-
-				/*
-				 * Per RemoveAttributeById(), dropped columns will have their
-				 * attnotnull unset, so we needn't check for dropped columns
-				 * in the above condition.
-				 */
-				Assert(!attr->attisdropped);
-			}
-		}
-	}
-
 	/*
 	 * Estimate relation size --- unless it's an inheritance parent, in which
 	 * case the size we want is not the rel's own size but the size of its
@@ -681,6 +651,39 @@ get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 	}
 }
 
+/*
+ * get_relation_notnullatts
+ *		Record which columns of the given relation are defined as NOT NULL.
+ */
+void
+get_relation_notnullatts(Relation relation, RangeTblEntry *rte)
+{
+	Assert(rte->rtekind == RTE_RELATION);
+
+	rte->notnullattnums = NULL;
+
+	if (relation->rd_att->constr && relation->rd_att->constr->has_not_null)
+	{
+		for (int i = 0; i < relation->rd_att->natts; i++)
+		{
+			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
+
+			if (attr->attnotnull)
+			{
+				rte->notnullattnums = bms_add_member(rte->notnullattnums,
+													 i + 1);
+
+				/*
+				 * Per RemoveAttributeById(), dropped columns will have their
+				 * attnotnull unset, so we needn't check for dropped columns
+				 * in the above condition.
+				 */
+				Assert(!attr->attisdropped);
+			}
+		}
+	}
+}
+
 /*
  * infer_arbiter_indexes -
  *	  Determine the unique indexes used to arbitrate speculative insertion.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..6db5b1273ab 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -222,7 +222,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->relid = relid;
 	rel->rtekind = rte->rtekind;
 	/* min_attr, max_attr, attr_needed, attr_widths are set below */
-	rel->notnullattnums = NULL;
 	rel->lateral_vars = NIL;
 	rel->indexlist = NIL;
 	rel->statlist = NIL;
@@ -727,7 +726,6 @@ build_join_rel(PlannerInfo *root,
 	joinrel->max_attr = 0;
 	joinrel->attr_needed = NULL;
 	joinrel->attr_widths = NULL;
-	joinrel->notnullattnums = NULL;
 	joinrel->nulling_relids = NULL;
 	joinrel->lateral_vars = NIL;
 	joinrel->lateral_referencers = NULL;
@@ -916,7 +914,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->max_attr = 0;
 	joinrel->attr_needed = NULL;
 	joinrel->attr_widths = NULL;
-	joinrel->notnullattnums = NULL;
 	joinrel->nulling_relids = NULL;
 	joinrel->lateral_vars = NIL;
 	joinrel->lateral_referencers = NULL;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..0ba80b792ef 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1080,6 +1080,9 @@ typedef struct RangeTblEntry
 	 * this RTE in the containing struct's list of same; 0 if permissions need
 	 * not be checked for this RTE.
 	 *
+	 * notnullattnums is zero-based set containing attnums of NOT NULL
+	 * columns.
+	 *
 	 * As a special case, relid, relkind, rellockmode, and perminfoindex can
 	 * also be set (nonzero) in an RTE_SUBQUERY RTE.  This occurs when we
 	 * convert an RTE_RELATION RTE naming a view into an RTE_SUBQUERY
@@ -1105,6 +1108,8 @@ typedef struct RangeTblEntry
 	Index		perminfoindex pg_node_attr(query_jumble_ignore);
 	/* sampling info, or NULL */
 	struct TableSampleClause *tablesample;
+	/* columns defined as NOT NULL */
+	Bitmapset  *notnullattnums;
 
 	/*
 	 * Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c24a1fc8514..11334d5bc1b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -955,12 +955,6 @@ typedef struct RelOptInfo
 	Relids	   *attr_needed pg_node_attr(read_write_ignore);
 	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths pg_node_attr(read_write_ignore);
-
-	/*
-	 * Zero-based set containing attnums of NOT NULL columns.  Not populated
-	 * for rels corresponding to non-partitioned inh==true RTEs.
-	 */
-	Bitmapset  *notnullattnums;
 	/* relids of outer joins that can null this baserel */
 	Relids		nulling_relids;
 	/* LATERAL Vars and PHVs referenced by rel */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 78e05d88c8e..748556c9163 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -154,6 +154,8 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 						   Oid result_collation);
 
+extern bool var_is_nonnullable(PlannerInfo *root, Var *var);
+
 extern List *expand_function_arguments(List *args, bool include_out_arguments,
 									   Oid result_type,
 									   struct HeapTupleData *func_tuple);
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index cd74e4b1e8b..c3c818ec116 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -28,6 +28,8 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
 extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 							  bool inhparent, RelOptInfo *rel);
 
+extern void get_relation_notnullatts(Relation relation, RangeTblEntry *rte);
+
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
 extern void estimate_rel_size(Relation rel, int32 *attr_widths,
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index df56202777c..50bb9830260 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -23,6 +23,7 @@
  */
 extern void transform_MERGE_to_join(Query *parse);
 extern void replace_empty_jointree(Query *parse);
+extern void collect_relation_attrs(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
 extern Query *expand_virtual_generated_columns(PlannerInfo *root);
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index dc09c85938e..a93f25a417e 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1472,11 +1472,11 @@ where coalesce(t2.b, 1) = 2;
 explain (costs off)
 select t1.a from gtest32 t1 left join gtest32 t2 on t1.a = t2.a
 where coalesce(t2.b, 1) = 2 or t1.a is null;
-                         QUERY PLAN                          
--------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Hash Left Join
    Hash Cond: (t1.a = t2.a)
-   Filter: ((COALESCE((t2.a * 2), 1) = 2) OR (t1.a IS NULL))
+   Filter: (COALESCE((t2.a * 2), 1) = 2)
    ->  Seq Scan on gtest32 t1
    ->  Hash
          ->  Seq Scan on gtest32 t2
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a57bb18c24f..69877e310f6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3639,8 +3639,8 @@ from nt3 as nt3
     ) as ss2
     on ss2.id = nt3.nt2_id
 where nt3.id = 1 and ss2.b3;
-                  QUERY PLAN                   
------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
@@ -3649,7 +3649,7 @@ where nt3.id = 1 and ss2.b3;
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
          Index Cond: (id = nt2.nt1_id)
-         Filter: (nt2.b1 AND (id IS NOT NULL))
+         Filter: (nt2.b1 AND true)
 (9 rows)
 
 select nt3.id
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index b79037748b7..e025c05261d 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -84,10 +84,10 @@ SELECT * FROM pred_tab t WHERE t.a IS NULL OR t.c IS NULL;
 -- are provably false
 EXPLAIN (COSTS OFF)
 SELECT * FROM pred_tab t WHERE t.b IS NULL OR t.c IS NULL;
-               QUERY PLAN               
-----------------------------------------
+       QUERY PLAN       
+------------------------
  Seq Scan on pred_tab t
-   Filter: ((b IS NULL) OR (c IS NULL))
+   Filter: (b IS NULL)
 (2 rows)
 
 --
-- 
2.43.0

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Richard Guo (#7)

1 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Sun, Mar 23, 2025 at 6:25 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Sat, Mar 22, 2025 at 2:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The way to make this work is what I said before: move the planner's
collection of relation information to somewhere a bit earlier in
the planner. But not to outside the planner.

I'm considering moving the collection of attnotnull information before
pull_up_sublinks, in hopes of leveraging this info to pull up NOT IN
in the future, something like attached.

Here is an updated version of the patch with some cosmetic changes and
a more readable commit message. I'm wondering if it's good enough to
be pushed. Any comments?

Thanks
Richard

Attachments:

v3-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patchapplication/octet-stream; name=v3-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patchDownload

From 3129bc9700f339570517f94db2f472c35378e62f Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 19 Mar 2025 16:16:12 +0900
Subject: [PATCH v3] Reduce "Var IS [NOT] NULL" quals during constant folding

In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets.  This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner.  However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.

Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point.  This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.

This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks and performs the NullTest deduction
for Vars during constant folding.  This also makes it possible to
leverage this information to pull up NOT IN subqueries.

Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false.  Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
---
 .../postgres_fdw/expected/postgres_fdw.out    |  8 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  2 +-
 src/backend/optimizer/plan/initsplan.c        | 26 +------
 src/backend/optimizer/plan/planner.c          |  9 +++
 src/backend/optimizer/plan/subselect.c        | 22 ++++--
 src/backend/optimizer/prep/prepjointree.c     | 41 +++++++++++
 src/backend/optimizer/util/clauses.c          | 73 ++++++++++++++++++-
 src/backend/optimizer/util/inherit.c          | 26 ++++---
 src/backend/optimizer/util/plancat.c          | 63 ++++++++--------
 src/backend/optimizer/util/relnode.c          |  3 -
 src/include/nodes/parsenodes.h                |  5 ++
 src/include/nodes/pathnodes.h                 |  6 --
 src/include/optimizer/optimizer.h             |  2 +
 src/include/optimizer/plancat.h               |  2 +
 src/include/optimizer/prep.h                  |  1 +
 .../regress/expected/generated_virtual.out    |  6 +-
 src/test/regress/expected/join.out            |  6 +-
 src/test/regress/expected/predicate.out       |  6 +-
 18 files changed, 213 insertions(+), 94 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d1acee5a5fa..a20897a6e49 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -699,12 +699,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- Op
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
-                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
+                                                              QUERY PLAN                                                              
+--------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index ea6287b03fd..26576b71cae 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -344,7 +344,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;        -- Nu
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- SubscriptingRef
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 1d1aa27d450..fc6f4f2cef4 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -545,7 +545,7 @@ remove_useless_groupby_columns(PlannerInfo *root)
 				 */
 				if (!index->nullsnotdistinct &&
 					!bms_is_member(index->indexkeys[i],
-								   rel->notnullattnums))
+								   rte->notnullattnums))
 				{
 					nulls_check_ok = false;
 					break;
@@ -3048,36 +3048,16 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
  * expr_is_nonnullable
  *	  Check to see if the Expr cannot be NULL
  *
- * If the Expr is a simple Var that is defined NOT NULL and meanwhile is not
- * nulled by any outer joins, then we can know that it cannot be NULL.
+ * Currently we only support simple Vars.
  */
 static bool
 expr_is_nonnullable(PlannerInfo *root, Expr *expr)
 {
-	RelOptInfo *rel;
-	Var		   *var;
-
 	/* For now only check simple Vars */
 	if (!IsA(expr, Var))
 		return false;
 
-	var = (Var *) expr;
-
-	/* could the Var be nulled by any outer joins? */
-	if (!bms_is_empty(var->varnullingrels))
-		return false;
-
-	/* system columns cannot be NULL */
-	if (var->varattno < 0)
-		return true;
-
-	/* is the column defined NOT NULL? */
-	rel = find_base_rel(root, var->varno);
-	if (var->varattno > 0 &&
-		bms_is_member(var->varattno, rel->notnullattnums))
-		return true;
-
-	return false;
+	return var_is_nonnullable(root, (Var *) expr);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 566ce5b3cb4..ab6f7f42109 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -723,6 +723,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	replace_empty_jointree(parse);
 
+	/*
+	 * Scan the query's rangetable for ordinary relations and retrieve
+	 * attribute information from the system catalogs for each of them.  Note
+	 * that this step does not descend into SubLinks and subqueries; if we
+	 * pull up any SubLinks or subqueries below, their rangetables are scanned
+	 * just before pulling them up.
+	 */
+	collect_relation_attrs(parse);
+
 	/*
 	 * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
 	 * to transform them into joins.  Note that this step does not descend
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 8230cbea3c3..1a9b3d2e0af 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1440,6 +1440,12 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	 */
 	replace_empty_jointree(subselect);
 
+	/*
+	 * Scan the subquery's rangetable for ordinary relations and retrieve
+	 * attribute information from the system catalogs for each of them.
+	 */
+	collect_relation_attrs(subselect);
+
 	/*
 	 * Prepare to pull up the sub-select into top range table.
 	 *
@@ -1652,6 +1658,7 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 					  Node **testexpr, List **paramIds)
 {
 	Node	   *whereClause;
+	PlannerInfo subroot;
 	List	   *leftargs,
 			   *rightargs,
 			   *opids,
@@ -1711,12 +1718,15 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 	 * parent aliases were flattened already, and we're not going to pull any
 	 * child Vars (of any description) into the parent.
 	 *
-	 * Note: passing the parent's root to eval_const_expressions is
-	 * technically wrong, but we can get away with it since only the
-	 * boundParams (if any) are used, and those would be the same in a
-	 * subroot.
-	 */
-	whereClause = eval_const_expressions(root, whereClause);
+	 * Note: we construct up an entirely dummy PlannerInfo to pass to
+	 * eval_const_expressions.  This is fine because only the "glob" and
+	 * "parse" links are used by eval_const_expressions.
+	 */
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	whereClause = eval_const_expressions(&subroot, whereClause);
 	whereClause = (Node *) canonicalize_qual((Expr *) whereClause, false);
 	whereClause = (Node *) make_ands_implicit((Expr *) whereClause);
 
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index d131a5bbc59..0546ca84c10 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -5,6 +5,7 @@
  *
  * NOTE: the intended sequence for invoking these operations is
  *		replace_empty_jointree
+ *		collect_relation_attrs
  *		pull_up_sublinks
  *		preprocess_function_rtes
  *		expand_virtual_generated_columns
@@ -36,6 +37,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
 #include "optimizer/tlist.h"
@@ -436,6 +438,39 @@ replace_empty_jointree(Query *parse)
 	parse->jointree->fromlist = list_make1(rtr);
 }
 
+/*
+ * collect_relation_attrs
+ *		Scan the query's rangetable for ordinary relations and retrieve
+ *		attribute information from the system catalogs for each of them.
+ */
+void
+collect_relation_attrs(Query *parse)
+{
+	ListCell   *lc;
+
+	foreach(lc, parse->rtable)
+	{
+		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+		Relation	relation;
+
+		/* We only collect attribute info for ordinary relations. */
+		if (rte->rtekind != RTE_RELATION)
+			continue;
+
+		/*
+		 * We need not lock the relation since it was already locked, either
+		 * by the rewriter or when expand_inherited_rtentry() added it to the
+		 * query's rangetable.
+		 */
+		relation = table_open(rte->relid, NoLock);
+
+		/* Record NOT NULL columns for this relation. */
+		get_relation_notnullatts(relation, rte);
+
+		table_close(relation, NoLock);
+	}
+}
+
 /*
  * pull_up_sublinks
  *		Attempt to pull up ANY and EXISTS SubLinks to be treated as
@@ -1327,6 +1362,12 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	 */
 	replace_empty_jointree(subquery);
 
+	/*
+	 * Scan the subquery's rangetable for ordinary relations and retrieve
+	 * attribute information from the system catalogs for each of them.
+	 */
+	collect_relation_attrs(subquery);
+
 	/*
 	 * Pull up any SubLinks within the subquery's quals, so that we don't
 	 * leave unoptimized SubLinks behind.
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 43dfecfb47f..e067273ca01 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -41,6 +42,7 @@
 #include "parser/analyze.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
@@ -2240,7 +2242,8 @@ rowtype_field_matches(Oid rowtypeid, int fieldnum,
  * only operators and functions that are reasonable to try to execute.
  *
  * NOTE: "root" can be passed as NULL if the caller never wants to do any
- * Param substitutions nor receive info about inlined functions.
+ * Param substitutions nor receive info about inlined functions nor reduce
+ * NullTest for Vars to constant true or constant false.
  *
  * NOTE: the planner assumes that this will always flatten nested AND and
  * OR clauses into N-argument form.  See comments in prepqual.c.
@@ -3535,6 +3538,31 @@ eval_const_expressions_mutator(Node *node,
 
 					return makeBoolConst(result, false);
 				}
+				if (!ntest->argisrow && arg && IsA(arg, Var) && context->root)
+				{
+					Var		   *varg = (Var *) arg;
+					bool		result;
+
+					if (var_is_nonnullable(context->root, varg))
+					{
+						switch (ntest->nulltesttype)
+						{
+							case IS_NULL:
+								result = false;
+								break;
+							case IS_NOT_NULL:
+								result = true;
+								break;
+							default:
+								elog(ERROR, "unrecognized nulltesttype: %d",
+									 (int) ntest->nulltesttype);
+								result = false; /* keep compiler quiet */
+								break;
+						}
+
+						return makeBoolConst(result, false);
+					}
+				}
 
 				newntest = makeNode(NullTest);
 				newntest->arg = (Expr *) arg;
@@ -4153,6 +4181,49 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 	return newexpr;
 }
 
+/*
+ * var_is_nonnullable
+ *	  Check to see if the Var cannot be NULL
+ *
+ * If the Var is defined NOT NULL and meanwhile is not nulled by any outer
+ * joins or grouping sets, then we can know that it cannot be NULL.
+ */
+bool
+var_is_nonnullable(PlannerInfo *root, Var *var)
+{
+	RangeTblEntry *rte;
+
+	Assert(IsA(var, Var));
+
+	/* skip upper-level Vars */
+	if (var->varlevelsup != 0)
+		return false;
+
+	/* could the Var be nulled by any outer joins or grouping sets? */
+	if (!bms_is_empty(var->varnullingrels))
+		return false;
+
+	/* system columns cannot be NULL */
+	if (var->varattno < 0)
+		return true;
+
+	/*
+	 * Check if the Var is defined as NOT NULL.  We must skip inheritance
+	 * parent tables, as some child tables may have a NOT NULL constraint for
+	 * a column while others may not.  This cannot happen with partitioned
+	 * tables, though.
+	 */
+	rte = planner_rt_fetch(var->varno, root);
+	if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
+		return false;
+
+	if (var->varattno > 0 &&
+		bms_is_member(var->varattno, rte->notnullattnums))
+		return true;
+
+	return false;
+}
+
 /*
  * expand_function_arguments: convert named-notation args to positional args
  * and/or insert default args, as needed
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 17e51cd75d7..448d487cd0b 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -466,8 +466,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 								Index *childRTindex_p)
 {
 	Query	   *parse = root->parse;
-	Oid			parentOID PG_USED_FOR_ASSERTS_ONLY =
-		RelationGetRelid(parentrel);
+	Oid			parentOID = RelationGetRelid(parentrel);
 	Oid			childOID = RelationGetRelid(childrel);
 	RangeTblEntry *childrte;
 	Index		childRTindex;
@@ -479,15 +478,16 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	/*
 	 * Build an RTE for the child, and attach to query's rangetable list. We
 	 * copy most scalar fields of the parent's RTE, but replace relation OID,
-	 * relkind, and inh for the child.  Set the child's securityQuals to
-	 * empty, because we only want to apply the parent's RLS conditions
-	 * regardless of what RLS properties individual children may have. (This
-	 * is an intentional choice to make inherited RLS work like regular
-	 * permissions checks.) The parent securityQuals will be propagated to
-	 * children along with other base restriction clauses, so we don't need to
-	 * do it here.  Other infrastructure of the parent RTE has to be
-	 * translated to match the child table's column ordering, which we do
-	 * below, so a "flat" copy is sufficient to start with.
+	 * relkind, and inh for the child.  We also replace notnullattnums for the
+	 * child if its relation OID is different from the parent's.  Set the
+	 * child's securityQuals to empty, because we only want to apply the
+	 * parent's RLS conditions regardless of what RLS properties individual
+	 * children may have. (This is an intentional choice to make inherited RLS
+	 * work like regular permissions checks.) The parent securityQuals will be
+	 * propagated to children along with other base restriction clauses, so we
+	 * don't need to do it here.  Other infrastructure of the parent RTE has
+	 * to be translated to match the child table's column ordering, which we
+	 * do below, so a "flat" copy is sufficient to start with.
 	 */
 	childrte = makeNode(RangeTblEntry);
 	memcpy(childrte, parentrte, sizeof(RangeTblEntry));
@@ -507,6 +507,10 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	/* No permission checking for child RTEs. */
 	childrte->perminfoindex = 0;
 
+	/* Record NOT NULL columns for the child if needed. */
+	if (childOID != parentOID)
+		get_relation_notnullatts(childrel, childrte);
+
 	/* Link not-yet-fully-filled child RTE into data structures */
 	parse->rtable = lappend(parse->rtable, childrte);
 	childRTindex = list_length(parse->rtable);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0489ad36644..956341c14d1 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -162,36 +162,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	rel->attr_widths = (int32 *)
 		palloc0((rel->max_attr - rel->min_attr + 1) * sizeof(int32));
 
-	/*
-	 * Record which columns are defined as NOT NULL.  We leave this
-	 * unpopulated for non-partitioned inheritance parent relations as it's
-	 * ambiguous as to what it means.  Some child tables may have a NOT NULL
-	 * constraint for a column while others may not.  We could work harder and
-	 * build a unioned set of all child relations notnullattnums, but there's
-	 * currently no need.  The RelOptInfo corresponding to the !inh
-	 * RangeTblEntry does get populated.
-	 */
-	if (!inhparent || relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		for (int i = 0; i < relation->rd_att->natts; i++)
-		{
-			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
-
-			if (attr->attnotnull)
-			{
-				rel->notnullattnums = bms_add_member(rel->notnullattnums,
-													 i + 1);
-
-				/*
-				 * Per RemoveAttributeById(), dropped columns will have their
-				 * attnotnull unset, so we needn't check for dropped columns
-				 * in the above condition.
-				 */
-				Assert(!attr->attisdropped);
-			}
-		}
-	}
-
 	/*
 	 * Estimate relation size --- unless it's an inheritance parent, in which
 	 * case the size we want is not the rel's own size but the size of its
@@ -681,6 +651,39 @@ get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 	}
 }
 
+/*
+ * get_relation_notnullatts
+ *		Record which columns of the given relation are defined as NOT NULL.
+ */
+void
+get_relation_notnullatts(Relation relation, RangeTblEntry *rte)
+{
+	Assert(rte->rtekind == RTE_RELATION);
+
+	rte->notnullattnums = NULL;
+
+	if (relation->rd_att->constr && relation->rd_att->constr->has_not_null)
+	{
+		for (int i = 0; i < relation->rd_att->natts; i++)
+		{
+			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
+
+			if (attr->attnotnull)
+			{
+				rte->notnullattnums = bms_add_member(rte->notnullattnums,
+													 i + 1);
+
+				/*
+				 * Per RemoveAttributeById(), dropped columns will have their
+				 * attnotnull unset, so we needn't check for dropped columns
+				 * in the above condition.
+				 */
+				Assert(!attr->attisdropped);
+			}
+		}
+	}
+}
+
 /*
  * infer_arbiter_indexes -
  *	  Determine the unique indexes used to arbitrate speculative insertion.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..6db5b1273ab 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -222,7 +222,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->relid = relid;
 	rel->rtekind = rte->rtekind;
 	/* min_attr, max_attr, attr_needed, attr_widths are set below */
-	rel->notnullattnums = NULL;
 	rel->lateral_vars = NIL;
 	rel->indexlist = NIL;
 	rel->statlist = NIL;
@@ -727,7 +726,6 @@ build_join_rel(PlannerInfo *root,
 	joinrel->max_attr = 0;
 	joinrel->attr_needed = NULL;
 	joinrel->attr_widths = NULL;
-	joinrel->notnullattnums = NULL;
 	joinrel->nulling_relids = NULL;
 	joinrel->lateral_vars = NIL;
 	joinrel->lateral_referencers = NULL;
@@ -916,7 +914,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->max_attr = 0;
 	joinrel->attr_needed = NULL;
 	joinrel->attr_widths = NULL;
-	joinrel->notnullattnums = NULL;
 	joinrel->nulling_relids = NULL;
 	joinrel->lateral_vars = NIL;
 	joinrel->lateral_referencers = NULL;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..0ba80b792ef 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1080,6 +1080,9 @@ typedef struct RangeTblEntry
 	 * this RTE in the containing struct's list of same; 0 if permissions need
 	 * not be checked for this RTE.
 	 *
+	 * notnullattnums is zero-based set containing attnums of NOT NULL
+	 * columns.
+	 *
 	 * As a special case, relid, relkind, rellockmode, and perminfoindex can
 	 * also be set (nonzero) in an RTE_SUBQUERY RTE.  This occurs when we
 	 * convert an RTE_RELATION RTE naming a view into an RTE_SUBQUERY
@@ -1105,6 +1108,8 @@ typedef struct RangeTblEntry
 	Index		perminfoindex pg_node_attr(query_jumble_ignore);
 	/* sampling info, or NULL */
 	struct TableSampleClause *tablesample;
+	/* columns defined as NOT NULL */
+	Bitmapset  *notnullattnums;
 
 	/*
 	 * Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c24a1fc8514..11334d5bc1b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -955,12 +955,6 @@ typedef struct RelOptInfo
 	Relids	   *attr_needed pg_node_attr(read_write_ignore);
 	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths pg_node_attr(read_write_ignore);
-
-	/*
-	 * Zero-based set containing attnums of NOT NULL columns.  Not populated
-	 * for rels corresponding to non-partitioned inh==true RTEs.
-	 */
-	Bitmapset  *notnullattnums;
 	/* relids of outer joins that can null this baserel */
 	Relids		nulling_relids;
 	/* LATERAL Vars and PHVs referenced by rel */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 78e05d88c8e..748556c9163 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -154,6 +154,8 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 						   Oid result_collation);
 
+extern bool var_is_nonnullable(PlannerInfo *root, Var *var);
+
 extern List *expand_function_arguments(List *args, bool include_out_arguments,
 									   Oid result_type,
 									   struct HeapTupleData *func_tuple);
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index cd74e4b1e8b..c3c818ec116 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -28,6 +28,8 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
 extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 							  bool inhparent, RelOptInfo *rel);
 
+extern void get_relation_notnullatts(Relation relation, RangeTblEntry *rte);
+
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
 extern void estimate_rel_size(Relation rel, int32 *attr_widths,
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index df56202777c..50bb9830260 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -23,6 +23,7 @@
  */
 extern void transform_MERGE_to_join(Query *parse);
 extern void replace_empty_jointree(Query *parse);
+extern void collect_relation_attrs(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
 extern Query *expand_virtual_generated_columns(PlannerInfo *root);
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index dc09c85938e..a93f25a417e 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1472,11 +1472,11 @@ where coalesce(t2.b, 1) = 2;
 explain (costs off)
 select t1.a from gtest32 t1 left join gtest32 t2 on t1.a = t2.a
 where coalesce(t2.b, 1) = 2 or t1.a is null;
-                         QUERY PLAN                          
--------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Hash Left Join
    Hash Cond: (t1.a = t2.a)
-   Filter: ((COALESCE((t2.a * 2), 1) = 2) OR (t1.a IS NULL))
+   Filter: (COALESCE((t2.a * 2), 1) = 2)
    ->  Seq Scan on gtest32 t1
    ->  Hash
          ->  Seq Scan on gtest32 t2
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a57bb18c24f..69877e310f6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3639,8 +3639,8 @@ from nt3 as nt3
     ) as ss2
     on ss2.id = nt3.nt2_id
 where nt3.id = 1 and ss2.b3;
-                  QUERY PLAN                   
------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
@@ -3649,7 +3649,7 @@ where nt3.id = 1 and ss2.b3;
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
          Index Cond: (id = nt2.nt1_id)
-         Filter: (nt2.b1 AND (id IS NOT NULL))
+         Filter: (nt2.b1 AND true)
 (9 rows)
 
 select nt3.id
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index b79037748b7..e025c05261d 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -84,10 +84,10 @@ SELECT * FROM pred_tab t WHERE t.a IS NULL OR t.c IS NULL;
 -- are provably false
 EXPLAIN (COSTS OFF)
 SELECT * FROM pred_tab t WHERE t.b IS NULL OR t.c IS NULL;
-               QUERY PLAN               
-----------------------------------------
+       QUERY PLAN       
+------------------------
  Seq Scan on pred_tab t
-   Filter: ((b IS NULL) OR (c IS NULL))
+   Filter: (b IS NULL)
 (2 rows)
 
 --
-- 
2.43.0

Tender Wang

tndrwang@gmail.com

10 months ago

In reply to: Richard Guo (#8)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Richard Guo <guofenglinux@gmail.com> 于2025年3月26日周三 10:16写道：

On Sun, Mar 23, 2025 at 6:25 PM Richard Guo <guofenglinux@gmail.com>
wrote:

On Sat, Mar 22, 2025 at 2:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The way to make this work is what I said before: move the planner's
collection of relation information to somewhere a bit earlier in
the planner. But not to outside the planner.

I'm considering moving the collection of attnotnull information before
pull_up_sublinks, in hopes of leveraging this info to pull up NOT IN
in the future, something like attached.

Here is an updated version of the patch with some cosmetic changes and
a more readable commit message. I'm wondering if it's good enough to
be pushed. Any comments?

The comment about notnullattnums in struct RangeTblEntry says that:
* notnullattnums is zero-based set containing attnums of NOT NULL
* columns.

But in get_relation_notnullatts():
rte->notnullattnums = bms_add_member(rte->notnullattnums,
i + 1);

The notnullattnums seem to be 1-based.

--
Thanks,
Tender Wang

#10

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Tender Wang (#9)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Mar 26, 2025 at 3:06 PM Tender Wang <tndrwang@gmail.com> wrote:

The comment about notnullattnums in struct RangeTblEntry says that:
* notnullattnums is zero-based set containing attnums of NOT NULL
* columns.

But in get_relation_notnullatts():
rte->notnullattnums = bms_add_member(rte->notnullattnums,
i + 1);

The notnullattnums seem to be 1-based.

This corresponds to the attribute numbers in Var nodes; you can
consider zero as representing a whole-row Var.

Thanks
Richard

#11

David Rowley

dgrowleyml@gmail.com

10 months ago

In reply to: Richard Guo (#10)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, 26 Mar 2025 at 19:31, Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Mar 26, 2025 at 3:06 PM Tender Wang <tndrwang@gmail.com> wrote:

The comment about notnullattnums in struct RangeTblEntry says that:
* notnullattnums is zero-based set containing attnums of NOT NULL
* columns.

But in get_relation_notnullatts():
rte->notnullattnums = bms_add_member(rte->notnullattnums,
i + 1);

The notnullattnums seem to be 1-based.

This corresponds to the attribute numbers in Var nodes; you can
consider zero as representing a whole-row Var.

Yeah, and a negative number is a system attribute, which the Bitmapset
can't represent... The zero-based comment is meant to inform the
reader that they don't need to offset by
FirstLowInvalidHeapAttributeNumber when indexing the Bitmapset. If
there's some confusion about that then maybe the wording could be
improved. I used "zero-based" because I wanted to state what it was
and that was the most brief terminology that I could think of to do
that. The only other way I thought about was to say that "it's not
offset by FirstLowInvalidHeapAttributeNumber", but I thought it was
better to say what it is rather than what it isn't.

I'm open to suggestions if people are confused about this.

David

#12

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: David Rowley (#11)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Mar 26, 2025 at 6:45 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 26 Mar 2025 at 19:31, Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Mar 26, 2025 at 3:06 PM Tender Wang <tndrwang@gmail.com> wrote:

The comment about notnullattnums in struct RangeTblEntry says that:
* notnullattnums is zero-based set containing attnums of NOT NULL
* columns.

But in get_relation_notnullatts():
rte->notnullattnums = bms_add_member(rte->notnullattnums,
i + 1);

The notnullattnums seem to be 1-based.

This corresponds to the attribute numbers in Var nodes; you can
consider zero as representing a whole-row Var.

Yeah, and a negative number is a system attribute, which the Bitmapset
can't represent... The zero-based comment is meant to inform the
reader that they don't need to offset by
FirstLowInvalidHeapAttributeNumber when indexing the Bitmapset. If
there's some confusion about that then maybe the wording could be
improved. I used "zero-based" because I wanted to state what it was
and that was the most brief terminology that I could think of to do
that. The only other way I thought about was to say that "it's not
offset by FirstLowInvalidHeapAttributeNumber", but I thought it was
better to say what it is rather than what it isn't.

I'm open to suggestions if people are confused about this.

I searched the current terminology used in code and can find "offset
by FirstLowInvalidHeapAttributeNumber", but not "not offset by
FirstLowInvalidHeapAttributeNumber". I think "zero-based" should be
sufficient to indicate that this bitmapset is offset by zero, not by
FirstLowInvalidHeapAttributeNumber. So I'm fine to go with
"zero-based".

I'm planning to push this patch soon, barring any objections.

Thanks
Richard

#13

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Richard Guo (#12)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Richard Guo <guofenglinux@gmail.com> writes:

I'm planning to push this patch soon, barring any objections.

FWIW, I have not reviewed it at all.

regards, tom lane

#14

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Tom Lane (#13)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Thu, Mar 27, 2025 at 10:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

I'm planning to push this patch soon, barring any objections.

FWIW, I have not reviewed it at all.

Oh, sorry. I'll hold off on pushing it.

Thanks
Richard

#15

Robert Haas

robertmhaas@gmail.com

10 months ago

In reply to: Richard Guo (#14)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Thu, Mar 27, 2025 at 10:08 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Mar 27, 2025 at 10:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

I'm planning to push this patch soon, barring any objections.

FWIW, I have not reviewed it at all.

Oh, sorry. I'll hold off on pushing it.

As a general point, non-trivial patches should really get some
substantive review before they are pushed. Please don't be in a rush
to commit. It is very common for the time from when a patch is first
posted to commit to be 3-6 months even for committers. Posting a
brand-new feature patch on March 21st and then pressing to commit on
March 27th is really not something you should be doing. I think it's
particularly inappropriate here where you actually got a review that
pointed out a serious design problem and then had to change the
design. If you didn't get it right on the first try, you shouldn't be
too confident that you did it perfectly the second time, either.

I took a look at this today and I'm not entirely comfortable with this:

+ rel = table_open(rte->relid, NoLock);
+
+ /* Record NOT NULL columns for this relation. */
+ get_relation_notnullatts(rel, rte);
+
+ table_close(rel, NoLock);

As a general principle, I have found that it's usually a sign that
something has been designed poorly when you find yourself wanting to
open a relation, get exactly one piece of information, and close the
relation again. That is why, today, all the information that the
planner needs about a particular relation is retrieved by
get_relation_info(). We do not just wander around doing random catalog
lookups wherever we need some critical detail. This patch increases
the number of places where we fetch relation data from 1 to 2, but
it's still the case that almost everything happens in
get_relation_info(), and there's now just exactly this 1 thing that is
done in a different place. That doesn't seem especially nice. I
thought the idea was going to be to move get_relation_info() to an
earlier stage, not split one thing out of it while leaving everything
else the same.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Robert Haas (#15)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Tue, Apr 1, 2025 at 1:55 AM Robert Haas <robertmhaas@gmail.com> wrote:

As a general principle, I have found that it's usually a sign that
something has been designed poorly when you find yourself wanting to
open a relation, get exactly one piece of information, and close the
relation again. That is why, today, all the information that the
planner needs about a particular relation is retrieved by
get_relation_info(). We do not just wander around doing random catalog
lookups wherever we need some critical detail. This patch increases
the number of places where we fetch relation data from 1 to 2, but
it's still the case that almost everything happens in
get_relation_info(), and there's now just exactly this 1 thing that is
done in a different place. That doesn't seem especially nice. I
thought the idea was going to be to move get_relation_info() to an
earlier stage, not split one thing out of it while leaving everything
else the same.

I initially considered moving get_relation_info() to an earlier stage,
where we would collect all the per-relation data by relation OID.
Later, when building the RelOptInfos, we could link them to the
per-relation-OID data.

However, I gave up this idea because I realized it would require
retrieving a whole bundle of catalog information that isn't needed
until after the RelOptInfos are built, such as max_attr, pages,
tuples, reltablespace, parallel_workers, extended statistics, etc.
And we may also need to create the IndexOptInfos for the relation's
indexes. It seems to me that it's not a trivial task to move
get_relation_info() before building the RelOptInfos, and more
importantly, it's unnecessary most of the time.

In other words, of the many pieces of catalog information we need to
retrieve for a relation, only a small portion is needed at an early
stage. As far as I can see, this small portion includes:

* relhassubclass, which we retrieve immediately after we have finished
adding rangetable entries.

* attgenerated, which we retrieve in expand_virtual_generated_columns.

* attnotnull, as discussed here, which should ideally be retrieved
before pull_up_sublinks.

My idea is to retrieve only this small portion at an early stage, and
still defer collecting the majority of the catalog information until
building the RelOptInfos. It might be possible to optimize by
retrieving this small portion in one place instead of three, but
moving the entire get_relation_info() to an earlier stage doesn't seem
like a good idea to me.

It could be argued that the separation of catalog information
collection isn't very great, but it seems this isn't something new in
this patch. So I respectfully disagree with your statement that "all
the information that the planner needs about a particular relation is
retrieved by get_relation_info()", and that "there's now just exactly
this 1 thing that is done in a different place". For instance,
relhassubclass and attgenerated are retrieved before expression
preprocessing, a relation's constraint expressions are retrieved when
setting the relation's size estimates, and more.

Thanks
Richard

#17

Robert Haas

robertmhaas@gmail.com

10 months ago

In reply to: Richard Guo (#16)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Tue, Apr 1, 2025 at 2:34 AM Richard Guo <guofenglinux@gmail.com> wrote:

However, I gave up this idea because I realized it would require
retrieving a whole bundle of catalog information that isn't needed
until after the RelOptInfos are built, such as max_attr, pages,
tuples, reltablespace, parallel_workers, extended statistics, etc.

Why is that bad? I mean, if we're going to need that information
anyway, then gathering it at earlier stage doesn't hurt. Of course, if
we move it too early, say before partition pruning, then we might
gather information we don't really need and hurt performance. But
otherwise it doesn't seem to hurt anything.

And we may also need to create the IndexOptInfos for the relation's
indexes. It seems to me that it's not a trivial task to move
get_relation_info() before building the RelOptInfos, and more
importantly, it's unnecessary most of the time.

But again, if the work is going to have to be done anyway, who cares?

It could be argued that the separation of catalog information
collection isn't very great, but it seems this isn't something new in
this patch. So I respectfully disagree with your statement that "all
the information that the planner needs about a particular relation is
retrieved by get_relation_info()", and that "there's now just exactly
this 1 thing that is done in a different place". For instance,
relhassubclass and attgenerated are retrieved before expression
preprocessing, a relation's constraint expressions are retrieved when
setting the relation's size estimates, and more.

Nonetheless I think we ought to be trying to consolidate more things
into get_relation_info(), not disperse some of the things that are
there to other places.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18

Richard Guo

guofenglinux@gmail.com

10 months ago

In reply to: Robert Haas (#17)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Apr 2, 2025 at 4:34 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Apr 1, 2025 at 2:34 AM Richard Guo <guofenglinux@gmail.com> wrote:

However, I gave up this idea because I realized it would require
retrieving a whole bundle of catalog information that isn't needed
until after the RelOptInfos are built, such as max_attr, pages,
tuples, reltablespace, parallel_workers, extended statistics, etc.

Why is that bad? I mean, if we're going to need that information
anyway, then gathering it at earlier stage doesn't hurt. Of course, if
we move it too early, say before partition pruning, then we might
gather information we don't really need and hurt performance. But
otherwise it doesn't seem to hurt anything.

The attnotnull catalog information being discussed here is intended
for use during constant folding (and possibly sublink pull-up), which
occurs long before partition pruning. Am I missing something?

Additionally, I'm doubtful that the collection of relhassubclass can
be moved after partition pruning. How can we determine whether a
relation is inheritable without retrieving its relhassubclass
information?

As for attgenerated, I also doubt that it can be retrieved after
partition pruning. It is used to expand virtual generated columns,
which can result in new PlaceHolderVars. This means it must be done
before deconstruct_jointree, as make_outerjoininfo requires all active
PlaceHolderVars to be present in root->placeholder_list.

If these pieces of information cannot be retrieved after partition
pruning, and for performance reasons we don't want to move the
gathering of the majority of the catalog information before partition
pruning, then it seems to me that moving get_relation_info() to an
earlier stage might not be very meaningful. What do you think?

It could be argued that the separation of catalog information
collection isn't very great, but it seems this isn't something new in
this patch. So I respectfully disagree with your statement that "all
the information that the planner needs about a particular relation is
retrieved by get_relation_info()", and that "there's now just exactly
this 1 thing that is done in a different place". For instance,
relhassubclass and attgenerated are retrieved before expression
preprocessing, a relation's constraint expressions are retrieved when
setting the relation's size estimates, and more.

Nonetheless I think we ought to be trying to consolidate more things
into get_relation_info(), not disperse some of the things that are
there to other places.

I agree with the general idea of more consolidation and less
dispersion, but squashing all information collection into
get_relation_info() seems quite challenging. However, I think we can
make at least one optimization, as I mentioned upthread — retrieving
the small portion of catalog information needed at an early stage in
one place instead of three. Perhaps we could start with moving the
retrieval of relhassubclass into collect_relation_attrs() and rename
this function to better reflect this change.

Thanks
Richard

#19

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Richard Guo (#18)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Tue, Apr 1, 2025 at 10:14 PM Richard Guo <guofenglinux@gmail.com> wrote:

The attnotnull catalog information being discussed here is intended
for use during constant folding (and possibly sublink pull-up), which
occurs long before partition pruning. Am I missing something?

Hmm, OK, so you think that we need to gather this information early,
so that we can do constant folding correctly, but you don't want to
gather everything that get_relation_info() does at this stage, because
then we're doing extra work on partitions that might later be pruned.
Is that correct?

Additionally, I'm doubtful that the collection of relhassubclass can
be moved after partition pruning. How can we determine whether a
relation is inheritable without retrieving its relhassubclass
information?

We can't -- but notice that we open the relation before fetching
relhassubclass, and then pass down the Relation object to where
get_relation_info() is ultimately called, so that we do not repeatedly
open and close the Relation. I don't know if I can say exactly what's
going to go wrong if we add an extra table_open()/table_close() as you
do in the patch, but I've seen enough performance and correctness
problems with such code over the years to make me skeptical.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20

Richard Guo

guofenglinux@gmail.com

9 months ago

In reply to: Robert Haas (#19)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Sat, Apr 5, 2025 at 4:14 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Apr 1, 2025 at 10:14 PM Richard Guo <guofenglinux@gmail.com> wrote:

The attnotnull catalog information being discussed here is intended
for use during constant folding (and possibly sublink pull-up), which
occurs long before partition pruning. Am I missing something?

Hmm, OK, so you think that we need to gather this information early,
so that we can do constant folding correctly, but you don't want to
gather everything that get_relation_info() does at this stage, because
then we're doing extra work on partitions that might later be pruned.
Is that correct?

That's correct. As I mentioned earlier, I believe attnotnull isn't
the only piece of information we need to gather early on. My general
idea is to separate the collection of catalog information into two
phases:

* Phase 1 occurs at an early stage and collects the small portion of
catalog information that is needed for constant folding, setting the
inh flag for a relation, or expanding virtual generated columns. All
these happen very early in the planner, before partition pruning.

* Phase 2 collects the majority of the catalog information and occurs
when building the RelOptInfos, like what get_relation_info does.

FWIW, aside from partition pruning, I suspect there may be other cases
where a relation doesn't end up having a RelOptInfo created for it.
And the comment for add_base_rels_to_query further strengthens my
suspicion.

* Note: the reason we find the baserels by searching the jointree, rather
* than scanning the rangetable, is that the rangetable may contain RTEs
* for rels not actively part of the query, for example views. We don't
* want to make RelOptInfos for them.

If my suspicion is correct, then partition pruning isn't the only
reason we might not want to move get_relation_info to an earlier
stage.

Additionally, I'm doubtful that the collection of relhassubclass can
be moved after partition pruning. How can we determine whether a
relation is inheritable without retrieving its relhassubclass
information?

We can't -- but notice that we open the relation before fetching
relhassubclass, and then pass down the Relation object to where
get_relation_info() is ultimately called, so that we do not repeatedly
open and close the Relation. I don't know if I can say exactly what's
going to go wrong if we add an extra table_open()/table_close() as you
do in the patch, but I've seen enough performance and correctness
problems with such code over the years to make me skeptical.

I'm confused here. AFAICS, we don't open the relation before fetching
relhassubclass, according to the code that sets the inh flag in
subquery_planner. Additionally, I do not see we pass down the
Relation object to get_relation_info. In get_relation_info, we call
table_open to obtain the Relation object, use it to retrieve the
catalog information, and then call table_close to close the Relation.

Am I missing something, or do you mean that the relcache entry is
actually built earlier, and that table_open/table_close call in
get_relation_info merely increments/decrements the reference count?

IIUC, you're concerned about calling table_open/table_close to
retrieve catalog information. Do you know of a better way to retrieve
catalog information without calling table_open/table_close? I find
the table_open/table_close pattern is quite common in the current
code. In addition to get_relation_info(), I've also seen it in
get_relation_constraints(), get_relation_data_width(), and others.

Thanks
Richard

#21

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Richard Guo (#20)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Sun, Apr 6, 2025 at 10:59 PM Richard Guo <guofenglinux@gmail.com> wrote:

That's correct. As I mentioned earlier, I believe attnotnull isn't
the only piece of information we need to gather early on. My general
idea is to separate the collection of catalog information into two
phases:

* Phase 1 occurs at an early stage and collects the small portion of
catalog information that is needed for constant folding, setting the
inh flag for a relation, or expanding virtual generated columns. All
these happen very early in the planner, before partition pruning.

* Phase 2 collects the majority of the catalog information and occurs
when building the RelOptInfos, like what get_relation_info does.

OK. Maybe I shouldn't be worrying about the table_open() /
table_close() here, because I see that you are right that
has_subclass() is nearby, which admittedly does not involve opening
the relation, but it does involve fetching from the syscache a tuple
that we wouldn't need to fetch if we had a Relation available at that
point. And I see now that expand_virtual_generated_columns() is also
in that area and works similar to your proposed function in that it
just opens and closes all the relations. Perhaps that's just the way
we do things at this very early stage of the planner? But I feel like
it might make sense to do some reorganization of this code, though, so
that it more resembles the phase 1 and phase 2 as you describe them.
Both expand_virtual_generated_columns() and collect_relation_attrs()
care about exactly those relations with rte->rtekind == RTE_RELATION,
and even if we have to open and close all of those relations once to
do this processing, perhaps we can avoid doing it twice, and maybe
avoid the has_subclass() call along the way? Maybe we can hoist a loop
over parse->rtable up into subquery_planner and then have it call a
virtual-column expansion function and a non-null collection function
once per RTE_RELATION entry?

A related point that I'm noticing is that you record the not-NULL
information in the RangeTblEntry. I wonder whether that's going to be
a problem, because I think of the RangeTblEntry as a parse-time
structure and the RelOptInfo as a plan-time structure, meaning that we
shouldn't scribble on the former and that we should record any
plan-time details we need in the latter. I understand that the problem
is precisely that the RelOptInfo isn't yet available, but I'm not sure
that makes it OK to use the RangeTblEntry instead.

I'm confused here. AFAICS, we don't open the relation before fetching
relhassubclass, according to the code that sets the inh flag in
subquery_planner. Additionally, I do not see we pass down the
Relation object to get_relation_info. In get_relation_info, we call
table_open to obtain the Relation object, use it to retrieve the
catalog information, and then call table_close to close the Relation.

You're right. I don't know what I was thinking.

IIUC, you're concerned about calling table_open/table_close to
retrieve catalog information. Do you know of a better way to retrieve
catalog information without calling table_open/table_close? I find
the table_open/table_close pattern is quite common in the current
code. In addition to get_relation_info(), I've also seen it in
get_relation_constraints(), get_relation_data_width(), and others.

You're also right about this.

--
Robert Haas
EDB: http://www.enterprisedb.com

#22

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Robert Haas (#21)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Robert Haas <robertmhaas@gmail.com> writes:

OK. Maybe I shouldn't be worrying about the table_open() /
table_close() here, because I see that you are right that
has_subclass() is nearby, which admittedly does not involve opening
the relation, but it does involve fetching from the syscache a tuple
that we wouldn't need to fetch if we had a Relation available at that
point. And I see now that expand_virtual_generated_columns() is also
in that area and works similar to your proposed function in that it
just opens and closes all the relations. Perhaps that's just the way
we do things at this very early stage of the planner? But I feel like
it might make sense to do some reorganization of this code, though, so
that it more resembles the phase 1 and phase 2 as you describe them.

Indeed, I think those are hacks that we should get rid of, not
emulate. Note in particular that expand_virtual_generated_columns
is new in v18 and has exactly zero credibility as precedent. In fact,
I'm probably going to harass Peter about fixing it before v18 ships.
Randomly adding table_opens in the planner is not a route to high
planning performance.

A related point that I'm noticing is that you record the not-NULL
information in the RangeTblEntry.

Did we not just complain about that w.r.t. the v1 version of this
patch? RangeTblEntry is not where to store this info. We need
a new data structure, and IMO the data structure should be a hashtable
based on table OID, not relid. That way we can amortize across
multiple references to the same table within a query.

regards, tom lane

#23

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Tom Lane (#22)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Thu, Apr 10, 2025 at 3:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

A related point that I'm noticing is that you record the not-NULL
information in the RangeTblEntry.

Did we not just complain about that w.r.t. the v1 version of this
patch? RangeTblEntry is not where to store this info. We need
a new data structure, and IMO the data structure should be a hashtable
based on table OID, not relid. That way we can amortize across
multiple references to the same table within a query.

It's not quite the same complaint, because the earlier complaint was
that it was actually being done at parse time, and this complaint is
that it is scribbling on a parse-time data structure.

--
Robert Haas
EDB: http://www.enterprisedb.com

#24

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Robert Haas (#23)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Robert Haas <robertmhaas@gmail.com> writes:

It's not quite the same complaint, because the earlier complaint was
that it was actually being done at parse time, and this complaint is
that it is scribbling on a parse-time data structure.

Ah, right. But that's still not the direction we want to be
going in [1]/messages/by-id/2531459.1743871597@sss.pgh.pa.us.

regards, tom lane

[1]: /messages/by-id/2531459.1743871597@sss.pgh.pa.us

#25

Richard Guo

guofenglinux@gmail.com

9 months ago

In reply to: Robert Haas (#21)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Fri, Apr 11, 2025 at 4:45 AM Robert Haas <robertmhaas@gmail.com> wrote:

OK. Maybe I shouldn't be worrying about the table_open() /
table_close() here, because I see that you are right that
has_subclass() is nearby, which admittedly does not involve opening
the relation, but it does involve fetching from the syscache a tuple
that we wouldn't need to fetch if we had a Relation available at that
point. And I see now that expand_virtual_generated_columns() is also
in that area and works similar to your proposed function in that it
just opens and closes all the relations. Perhaps that's just the way
we do things at this very early stage of the planner? But I feel like
it might make sense to do some reorganization of this code, though, so
that it more resembles the phase 1 and phase 2 as you describe them.
Both expand_virtual_generated_columns() and collect_relation_attrs()
care about exactly those relations with rte->rtekind == RTE_RELATION,
and even if we have to open and close all of those relations once to
do this processing, perhaps we can avoid doing it twice, and maybe
avoid the has_subclass() call along the way?

Yeah, I agree on this. This is the optimization I mentioned upthread
in the last paragraph of [1]/messages/by-id/CAMbWs4-DryEm_U-juPn3HwUiwZRXW3jhfX18b_AFgrgihgq4kg@mail.gmail.com - retrieving the small portion of catalog
information needed at an early stage in one place instead of three.
Hopefully, this way we only need one table_open/table_close at the
early stage of the planner.

Maybe we can hoist a loop
over parse->rtable up into subquery_planner and then have it call a
virtual-column expansion function and a non-null collection function
once per RTE_RELATION entry?

Hmm, I'm afraid there might be some difficulty with this approach.
The virtual-column expansion needs to be done after sublink pull-up to
ensure that the virtual-column references within the SubLinks that
should be transformed into joins can get expanded, while non-null
collection needs to be done before sublink pull-up since we might want
to leverage the non-null information to convert NOT IN sublinks to
anti joins.

What I had in mind is that we hoist a loop over parse->rtable before
pull_up_sublinks to gather information about which columns of each
relation are defined as NOT NULL and which are virtually generated.
These information will be used in sublink pull-up and virtual-column
expansion. We also call has_subclass for each relation that is maked
'inh' within that loop and clear the inh flag if needed.

This seems to require a fair amount of code changes, so I'd like to
get some feedback on this approach before proceeding with the
implementation.

A related point that I'm noticing is that you record the not-NULL
information in the RangeTblEntry. I wonder whether that's going to be
a problem, because I think of the RangeTblEntry as a parse-time
structure and the RelOptInfo as a plan-time structure, meaning that we
shouldn't scribble on the former and that we should record any
plan-time details we need in the latter. I understand that the problem
is precisely that the RelOptInfo isn't yet available, but I'm not sure
that makes it OK to use the RangeTblEntry instead.

Fair point. We should do our best to refrain from scribbling on the
parsetree in the planner. I initially went with the hashtable
approach as Tom suggested, but later found it quite handy to store the
not-null information in the RangeTblEntry, especially since we do
something similar with rte->inh. However, I've come to realize that
inh may not be a good example to follow after all, so I'll go back to
the hashtable method.

Thank you for pointing that out before I went too far down the wrong
path.

[1]: /messages/by-id/CAMbWs4-DryEm_U-juPn3HwUiwZRXW3jhfX18b_AFgrgihgq4kg@mail.gmail.com

Thanks
Richard

#26

Richard Guo

guofenglinux@gmail.com

9 months ago

In reply to: Richard Guo (#25)

3 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Fri, Apr 11, 2025 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Fri, Apr 11, 2025 at 4:45 AM Robert Haas <robertmhaas@gmail.com> wrote:

OK. Maybe I shouldn't be worrying about the table_open() /
table_close() here, because I see that you are right that
has_subclass() is nearby, which admittedly does not involve opening
the relation, but it does involve fetching from the syscache a tuple
that we wouldn't need to fetch if we had a Relation available at that
point. And I see now that expand_virtual_generated_columns() is also
in that area and works similar to your proposed function in that it
just opens and closes all the relations. Perhaps that's just the way
we do things at this very early stage of the planner? But I feel like
it might make sense to do some reorganization of this code, though, so
that it more resembles the phase 1 and phase 2 as you describe them.
Both expand_virtual_generated_columns() and collect_relation_attrs()
care about exactly those relations with rte->rtekind == RTE_RELATION,
and even if we have to open and close all of those relations once to
do this processing, perhaps we can avoid doing it twice, and maybe
avoid the has_subclass() call along the way?

Yeah, I agree on this. This is the optimization I mentioned upthread
in the last paragraph of [1] - retrieving the small portion of catalog
information needed at an early stage in one place instead of three.
Hopefully, this way we only need one table_open/table_close at the
early stage of the planner.

Here is the patchset that implements this optimization. 0001 moves
the expansion of virtual generated columns to occur before sublink
pull-up. 0002 introduces a new function, preprocess_relation_rtes,
which scans the rangetable for relation RTEs and performs inh flag
updates and virtual generated column expansion in a single loop, so
that only one table_open/table_close call is required for each
relation. 0003 collects NOT NULL attribute information for each
relation within the same loop, stores it in a relation OID based hash
table, and uses this information to reduce NullTest quals during
constant folding.

I think the code now more closely resembles the phase 1 and phase 2
described earlier: it collects all required early-stage catalog
information within a single loop over the rangetable, allowing each
relation to be opened and closed only once. It also avoids the
has_subclass() call along the way.

Thanks
Richard

Attachments:

v4-0001-Expand-virtual-generated-columns-before-sublink-p.patchapplication/octet-stream; name=v4-0001-Expand-virtual-generated-columns-before-sublink-p.patchDownload

From a1ecd7ddc47d28c05f987423658b288ae7e45f2d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 23 Apr 2025 10:29:15 +0900
Subject: [PATCH v4 1/3] Expand virtual generated columns before sublink
 pull-up

Currently, we expand virtual generated columns after we have pulled up
any SubLinks within the query's quals.  This ensures that the virtual
generated column references within SubLinks that should be transformed
into joins are correctly expanded.  This approach works well and has
posed no issues.

In an upcoming patch, we plan to centralize the collection of catalog
information needed early in the planner.  This will help avoid
repeated table_open/table_close calls for relations in the rangetable.
Since this information is required during sublink pull-up, we are
moving the expansion of virtual generated columns to occur beforehand.

To achieve this, if any EXISTS SubLinks can be pulled up, their
rangetables are processed just before pulling them up.
---
 src/backend/optimizer/plan/planner.c          | 17 +++++++-------
 src/backend/optimizer/plan/subselect.c        | 16 ++++++++++++++
 src/backend/optimizer/prep/prepjointree.c     | 20 +++++++----------
 src/include/optimizer/prep.h                  |  2 +-
 .../regress/expected/generated_virtual.out    | 22 +++++++++++++++++++
 src/test/regress/sql/generated_virtual.sql    |  9 ++++++++
 6 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index beafac8c0b0..0dc83cd3c07 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -717,6 +717,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	transform_MERGE_to_join(parse);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the query that reference these columns with
+	 * the generation expressions.  Note that this step does not descend into
+	 * sublinks and subqueries; if we pull up any sublinks or subqueries
+	 * below, their rangetables are processed just before pulling them up.
+	 */
+	parse = root->parse = expand_virtual_generated_columns(root);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -740,14 +749,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	preprocess_function_rtes(root);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Recursion issues here are handled in the
-	 * same way as for SubLinks.
-	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
-
 	/*
 	 * Check to see if any subqueries in the jointree can be merged into this
 	 * query.
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index e7cb3fede66..89e6873da08 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1458,6 +1458,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	int			varno;
 	Relids		clause_varnos;
 	Relids		upper_varnos;
+	PlannerInfo subroot;
 
 	Assert(sublink->subLinkType == EXISTS_SUBLINK);
 
@@ -1487,6 +1488,21 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	if (!simplify_EXISTS_query(root, subselect))
 		return NULL;
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 *
+	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
+	 * is fine because only the "glob" and "parse" links will be used in this
+	 * case.
+	 */
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	subselect = expand_virtual_generated_columns(&subroot);
+
 	/*
 	 * Separate out the WHERE clause.  (We could theoretically also remove
 	 * top-level plain JOIN/ON clauses, but it's probably not worth the
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 87dc6f56b57..8140d22de70 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,10 +4,10 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
+ *		expand_virtual_generated_columns
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
- *		expand_virtual_generated_columns
  *		pull_up_subqueries
  *		flatten_simple_union_all
  *		do expression preprocessing (including flattening JOIN alias vars)
@@ -958,10 +958,6 @@ preprocess_function_rtes(PlannerInfo *root)
  * generation expressions.  Note that we do not descend into subqueries; that
  * is taken care of when the subqueries are planned.
  *
- * This has to be done after we have pulled up any SubLinks within the query's
- * quals; otherwise any virtual generated column references within the SubLinks
- * that should be transformed into joins wouldn't get expanded.
- *
  * Returns a modified copy of the query tree, if any relations with virtual
  * generated columns are present.
  */
@@ -1333,6 +1329,13 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/* No CTEs to worry about */
 	Assert(subquery->cteList == NIL);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 */
+	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -1352,13 +1355,6 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	 */
 	preprocess_function_rtes(subroot);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.
-	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
-
 	/*
 	 * Recursively pull up the subquery's subqueries, so that
 	 * pull_up_subqueries' processing is complete for its jointree and
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index df56202777c..ceb731bcf5e 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,10 +22,10 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
+extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void pull_up_subqueries(PlannerInfo *root);
 extern void flatten_simple_union_all(PlannerInfo *root);
 extern void reduce_outer_joins(PlannerInfo *root);
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6300e7c1d96..b766ccb1dc2 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1591,4 +1591,26 @@ select * from gtest32 t group by grouping sets (a, b, c, d) having c = 20;
    |   | 20 |  
 (1 row)
 
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (t1.a > t2.a)
+   ->  Seq Scan on gtest32 t1
+   ->  Materialize
+         ->  Seq Scan on gtest32 t2
+               Filter: ((a * 2) = 2)
+(6 rows)
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+ ?column? 
+----------
+        1
+(1 row)
+
 drop table gtest32;
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index b4eedeee2fb..5dd68381e1c 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -832,4 +832,13 @@ explain (verbose, costs off)
 select * from gtest32 t group by grouping sets (a, b, c, d) having c = 20;
 select * from gtest32 t group by grouping sets (a, b, c, d) having c = 20;
 
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
 drop table gtest32;
-- 
2.43.0

v4-0002-Centralize-collection-of-catalog-info-needed-earl.patchapplication/octet-stream; name=v4-0002-Centralize-collection-of-catalog-info-needed-earl.patchDownload

From f8f0e7bfad998fe56ab1857d5432665076289e3d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 24 Apr 2025 14:58:03 +0900
Subject: [PATCH v4 2/3] Centralize collection of catalog info needed early in
 the planner

There are several pieces of catalog information that need to be
retrieved for a relation during the early stage of planning.  These
include relhassubclass, which is used to clear the inh flag if the
relation has no children, as well as a column's attgenerated and
default value, which are needed to expand virtual generated columns.
More such information may be required in the future.

Currently, these pieces of catalog data are collected in multiple
places, resulting in repeated table_open/table_close calls for each
relation in the rangetable.  This patch centralizes the collection of
all required early-stage catalog information into a single loop over
the rangetable, allowing each relation to be opened and closed only
once.
---
 src/backend/optimizer/plan/planner.c      |  31 +--
 src/backend/optimizer/plan/subselect.c    |   9 +-
 src/backend/optimizer/prep/prepjointree.c | 299 +++++++++++++---------
 src/include/optimizer/prep.h              |   2 +-
 4 files changed, 190 insertions(+), 151 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0dc83cd3c07..2033e24d388 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -718,13 +718,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	transform_MERGE_to_join(parse);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Note that this step does not descend into
-	 * sublinks and subqueries; if we pull up any sublinks or subqueries
-	 * below, their rangetables are processed just before pulling them up.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.  Note that this
+	 * step does not descend into sublinks and subqueries; if we pull up any
+	 * sublinks or subqueries below, their relation RTEs are processed just
+	 * before pulling them up.
 	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
+	parse = root->parse = preprocess_relation_rtes(root);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
@@ -785,23 +787,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 
 		switch (rte->rtekind)
 		{
-			case RTE_RELATION:
-				if (rte->inh)
-				{
-					/*
-					 * Check to see if the relation actually has any children;
-					 * if not, clear the inh flag so we can treat it as a
-					 * plain base relation.
-					 *
-					 * Note: this could give a false-positive result, if the
-					 * rel once had children but no longer does.  We used to
-					 * be able to clear rte->inh later on when we discovered
-					 * that, but no more; we have to handle such cases as
-					 * full-fledged inheritance.
-					 */
-					rte->inh = has_subclass(rte->relid);
-				}
-				break;
 			case RTE_JOIN:
 				root->hasJoinRTEs = true;
 				if (IS_OUTER_JOIN(rte->jointype))
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 89e6873da08..65fc3f49d39 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1489,9 +1489,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 		return NULL;
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1501,7 +1502,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	subroot.type = T_PlannerInfo;
 	subroot.glob = root->glob;
 	subroot.parse = subselect;
-	subselect = expand_virtual_generated_columns(&subroot);
+	subselect = preprocess_relation_rtes(&subroot);
 
 	/*
 	 * Separate out the WHERE clause.  (We could theoretically also remove
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 8140d22de70..7d355bc4295 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,7 +4,7 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
- *		expand_virtual_generated_columns
+ *		preprocess_relation_rtes
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
@@ -102,6 +102,9 @@ typedef struct reduce_outer_joins_partial_state
 	Relids		unreduced_side; /* relids in its still-nullable side */
 } reduce_outer_joins_partial_state;
 
+static Query *expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+											   RangeTblEntry *rte, int rt_index,
+											   Relation relation);
 static Node *pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
 											   Relids *relids);
 static Node *pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
@@ -392,6 +395,173 @@ transform_MERGE_to_join(Query *parse)
 		parse->mergeJoinCondition = NULL;	/* join condition not needed */
 }
 
+/*
+ * preprocess_relation_rtes
+ *		Do the preprocessing work for any relation RTEs in the FROM clause.
+ *
+ * This scans the rangetable for relation RTEs and retrieves the necessary
+ * catalog information for each relation.  Using this information, it clears
+ * the inh flag for any relation that has no children, and expands virtual
+ * generated columns for any relation that contains them.
+ *
+ * Note that expanding virtual generated columns may cause the query tree to
+ * have new copies of rangetable entries.  Therefore, we have to use list_nth
+ * instead of foreach when iterating over the query's rangetable.
+ *
+ * Returns a modified copy of the query tree, if any relations with virtual
+ * generated columns are present.
+ */
+Query *
+preprocess_relation_rtes(PlannerInfo *root)
+{
+	Query	   *parse = root->parse;
+	int			rtable_size;
+	int			rt_index;
+
+	rtable_size = list_length(parse->rtable);
+
+	for (rt_index = 0; rt_index < rtable_size; rt_index++)
+	{
+		RangeTblEntry *rte = (RangeTblEntry *) list_nth(parse->rtable, rt_index);
+		Relation	relation;
+
+		/* We only care about relation RTEs. */
+		if (rte->rtekind != RTE_RELATION)
+			continue;
+
+		/*
+		 * We need not lock the relation since it was already locked by the
+		 * rewriter.
+		 */
+		relation = table_open(rte->relid, NoLock);
+
+		/*
+		 * Check to see if the relation actually has any children; if not,
+		 * clear the inh flag so we can treat it as a plain base relation.
+		 *
+		 * Note: this could give a false-positive result, if the rel once had
+		 * children but no longer does.  We used to be able to clear rte->inh
+		 * later on when we discovered that, but no more; we have to handle
+		 * such cases as full-fledged inheritance.
+		 */
+		if (rte->inh)
+			rte->inh = relation->rd_rel->relhassubclass;
+
+		/*
+		 * Check to see if the relation has any virtual generated columns; if
+		 * so, replace all Var nodes in the query that reference these columns
+		 * with the generation expressions.
+		 */
+		parse = expand_virtual_generated_columns(root, parse,
+												 rte, rt_index + 1,
+												 relation);
+
+		table_close(relation, NoLock);
+	}
+
+	return parse;
+}
+
+/*
+ * expand_virtual_generated_columns
+ *		Expand virtual generated columns for the given relation.
+ *
+ * This checks whether the given relation has any virtual generated columns,
+ * and if so, replaces all Var nodes in the query that reference those columns
+ * with their generation expressions.
+ *
+ * Returns a modified copy of the query tree if the relation contains virtual
+ * generated columns.
+ */
+static Query *
+expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+								 RangeTblEntry *rte, int rt_index,
+								 Relation relation)
+{
+	TupleDesc	tupdesc;
+
+	/* Only normal relations can have virtual generated columns */
+	Assert(rte->rtekind == RTE_RELATION);
+
+	tupdesc = RelationGetDescr(relation);
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		List	   *tlist = NIL;
+		pullup_replace_vars_context rvcontext;
+
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+			TargetEntry *tle;
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				Node	   *defexpr;
+
+				defexpr = build_generation_expression(relation, i + 1);
+				ChangeVarNodes(defexpr, 1, rt_index, 0);
+
+				tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+			else
+			{
+				Var		   *var;
+
+				var = makeVar(rt_index,
+							  i + 1,
+							  attr->atttypid,
+							  attr->atttypmod,
+							  attr->attcollation,
+							  0);
+
+				tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+		}
+
+		Assert(list_length(tlist) > 0);
+		Assert(!rte->lateral);
+
+		/*
+		 * The relation's targetlist items are now in the appropriate form to
+		 * insert into the query, except that we may need to wrap them in
+		 * PlaceHolderVars.  Set up required context data for
+		 * pullup_replace_vars.
+		 */
+		rvcontext.root = root;
+		rvcontext.targetlist = tlist;
+		rvcontext.target_rte = rte;
+		rvcontext.result_relation = parse->resultRelation;
+		/* won't need these values */
+		rvcontext.relids = NULL;
+		rvcontext.nullinfo = NULL;
+		/* pass NULL for outer_hasSubLinks */
+		rvcontext.outer_hasSubLinks = NULL;
+		rvcontext.varno = rt_index;
+		/* this flag will be set below, if needed */
+		rvcontext.wrap_option = REPLACE_WRAP_NONE;
+		/* initialize cache array with indexes 0 .. length(tlist) */
+		rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
+									 sizeof(Node *));
+
+		/*
+		 * If the query uses grouping sets, we need a PlaceHolderVar for each
+		 * expression of the relation's targetlist items.  (See comments in
+		 * pull_up_simple_subquery().)
+		 */
+		if (parse->groupingSets)
+			rvcontext.wrap_option = REPLACE_WRAP_ALL;
+
+		/*
+		 * Apply pullup variable replacement throughout the query tree.
+		 */
+		parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
+	}
+
+	return parse;
+}
+
 /*
  * replace_empty_jointree
  *		If the Query's jointree is empty, replace it with a dummy RTE_RESULT
@@ -949,124 +1119,6 @@ preprocess_function_rtes(PlannerInfo *root)
 	}
 }
 
-/*
- * expand_virtual_generated_columns
- *		Expand all virtual generated column references in a query.
- *
- * This scans the rangetable for relations with virtual generated columns, and
- * replaces all Var nodes in the query that reference these columns with the
- * generation expressions.  Note that we do not descend into subqueries; that
- * is taken care of when the subqueries are planned.
- *
- * Returns a modified copy of the query tree, if any relations with virtual
- * generated columns are present.
- */
-Query *
-expand_virtual_generated_columns(PlannerInfo *root)
-{
-	Query	   *parse = root->parse;
-	int			rt_index;
-	ListCell   *lc;
-
-	rt_index = 0;
-	foreach(lc, parse->rtable)
-	{
-		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
-		Relation	rel;
-		TupleDesc	tupdesc;
-
-		++rt_index;
-
-		/*
-		 * Only normal relations can have virtual generated columns.
-		 */
-		if (rte->rtekind != RTE_RELATION)
-			continue;
-
-		rel = table_open(rte->relid, NoLock);
-
-		tupdesc = RelationGetDescr(rel);
-		if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
-		{
-			List	   *tlist = NIL;
-			pullup_replace_vars_context rvcontext;
-
-			for (int i = 0; i < tupdesc->natts; i++)
-			{
-				Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
-				TargetEntry *tle;
-
-				if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-				{
-					Node	   *defexpr;
-
-					defexpr = build_generation_expression(rel, i + 1);
-					ChangeVarNodes(defexpr, 1, rt_index, 0);
-
-					tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-				else
-				{
-					Var		   *var;
-
-					var = makeVar(rt_index,
-								  i + 1,
-								  attr->atttypid,
-								  attr->atttypmod,
-								  attr->attcollation,
-								  0);
-
-					tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-			}
-
-			Assert(list_length(tlist) > 0);
-			Assert(!rte->lateral);
-
-			/*
-			 * The relation's targetlist items are now in the appropriate form
-			 * to insert into the query, except that we may need to wrap them
-			 * in PlaceHolderVars.  Set up required context data for
-			 * pullup_replace_vars.
-			 */
-			rvcontext.root = root;
-			rvcontext.targetlist = tlist;
-			rvcontext.target_rte = rte;
-			rvcontext.result_relation = parse->resultRelation;
-			/* won't need these values */
-			rvcontext.relids = NULL;
-			rvcontext.nullinfo = NULL;
-			/* pass NULL for outer_hasSubLinks */
-			rvcontext.outer_hasSubLinks = NULL;
-			rvcontext.varno = rt_index;
-			/* this flag will be set below, if needed */
-			rvcontext.wrap_option = REPLACE_WRAP_NONE;
-			/* initialize cache array with indexes 0 .. length(tlist) */
-			rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
-										 sizeof(Node *));
-
-			/*
-			 * If the query uses grouping sets, we need a PlaceHolderVar for
-			 * each expression of the relation's targetlist items.  (See
-			 * comments in pull_up_simple_subquery().)
-			 */
-			if (parse->groupingSets)
-				rvcontext.wrap_option = REPLACE_WRAP_ALL;
-
-			/*
-			 * Apply pullup variable replacement throughout the query tree.
-			 */
-			parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
-		}
-
-		table_close(rel, NoLock);
-	}
-
-	return parse;
-}
-
 /*
  * pull_up_subqueries
  *		Look for subqueries in the rangetable that can be pulled up into
@@ -1330,11 +1382,12 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	Assert(subquery->cteList == NIL);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index ceb731bcf5e..4fbecdb4462 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,7 +22,7 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
+extern Query *preprocess_relation_rtes(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-- 
2.43.0

v4-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchapplication/octet-stream; name=v4-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchDownload

From 54bb735405d4aa393ceea996f9db0294db0fc68c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 30 Apr 2025 18:50:37 +0900
Subject: [PATCH v4 3/3] Reduce "Var IS [NOT] NULL" quals during constant
 folding

In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets.  This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner.  However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.

Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point.  This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.

This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks, storing it in a hash table keyed by
relation OID.  It then uses this information to perform the NullTest
deduction for Vars during constant folding.  This also makes it
possible to leverage this information to pull up NOT IN subqueries.

Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false.  Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   8 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   2 +-
 src/backend/optimizer/plan/initsplan.c        |  24 +---
 src/backend/optimizer/plan/planner.c          |  12 +-
 src/backend/optimizer/plan/subselect.c        |  20 ++-
 src/backend/optimizer/prep/prepjointree.c     |  19 ++-
 src/backend/optimizer/util/clauses.c          |  91 ++++++++++++-
 src/backend/optimizer/util/inherit.c          |  10 +-
 src/backend/optimizer/util/plancat.c          | 127 +++++++++++++++---
 src/include/nodes/pathnodes.h                 |  12 +-
 src/include/optimizer/optimizer.h             |   2 +
 src/include/optimizer/plancat.h               |   4 +
 .../regress/expected/generated_virtual.out    |   6 +-
 src/test/regress/expected/join.out            |   6 +-
 src/test/regress/expected/predicate.out       |  54 +++++++-
 src/test/regress/sql/predicate.sql            |  18 +++
 src/tools/pgindent/typedefs.list              |   1 +
 17 files changed, 335 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 24ff5f70cce..12759d397d9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -710,12 +710,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- Op
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
-                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
+                                                              QUERY PLAN                                                              
+--------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1f27260bafe..7810cc4208d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -352,7 +352,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;        -- Nu
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- SubscriptingRef
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01804b085b3..3e3fec89252 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -3048,36 +3048,16 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
  * expr_is_nonnullable
  *	  Check to see if the Expr cannot be NULL
  *
- * If the Expr is a simple Var that is defined NOT NULL and meanwhile is not
- * nulled by any outer joins, then we can know that it cannot be NULL.
+ * Currently we only support simple Vars.
  */
 static bool
 expr_is_nonnullable(PlannerInfo *root, Expr *expr)
 {
-	RelOptInfo *rel;
-	Var		   *var;
-
 	/* For now only check simple Vars */
 	if (!IsA(expr, Var))
 		return false;
 
-	var = (Var *) expr;
-
-	/* could the Var be nulled by any outer joins? */
-	if (!bms_is_empty(var->varnullingrels))
-		return false;
-
-	/* system columns cannot be NULL */
-	if (var->varattno < 0)
-		return true;
-
-	/* is the column defined NOT NULL? */
-	rel = find_base_rel(root, var->varno);
-	if (var->varattno > 0 &&
-		bms_is_member(var->varattno, rel->notnullattnums))
-		return true;
-
-	return false;
+	return var_is_nonnullable(root, (Var *) expr, true);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2033e24d388..2f8045cfe7d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -338,6 +338,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	glob->lastPlanNodeId = 0;
 	glob->transientPlan = false;
 	glob->dependsOnRole = false;
+	glob->rel_notnullatts_hash = NULL;
 
 	/*
 	 * Assess whether it's feasible to use parallel mode for this query. We
@@ -720,11 +721,12 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.  Note that this
-	 * step does not descend into sublinks and subqueries; if we pull up any
-	 * sublinks or subqueries below, their relation RTEs are processed just
-	 * before pulling them up.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.  Note that this step does not descend into sublinks and
+	 * subqueries; if we pull up any sublinks or subqueries below, their
+	 * relation RTEs are processed just before pulling them up.
 	 */
 	parse = root->parse = preprocess_relation_rtes(root);
 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 65fc3f49d39..8ea20061594 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1491,8 +1491,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1749,6 +1751,7 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 					  Node **testexpr, List **paramIds)
 {
 	Node	   *whereClause;
+	PlannerInfo subroot;
 	List	   *leftargs,
 			   *rightargs,
 			   *opids,
@@ -1808,12 +1811,15 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 	 * parent aliases were flattened already, and we're not going to pull any
 	 * child Vars (of any description) into the parent.
 	 *
-	 * Note: passing the parent's root to eval_const_expressions is
-	 * technically wrong, but we can get away with it since only the
-	 * boundParams (if any) are used, and those would be the same in a
-	 * subroot.
+	 * Note: we construct up an entirely dummy PlannerInfo to pass to
+	 * eval_const_expressions.  This is fine because only the "glob" and
+	 * "parse" links are used by eval_const_expressions.
 	 */
-	whereClause = eval_const_expressions(root, whereClause);
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	whereClause = eval_const_expressions(&subroot, whereClause);
 	whereClause = (Node *) canonicalize_qual((Expr *) whereClause, false);
 	whereClause = (Node *) make_ands_implicit((Expr *) whereClause);
 
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 7d355bc4295..40af48bce1e 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -36,6 +36,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
 #include "optimizer/tlist.h"
@@ -401,8 +402,9 @@ transform_MERGE_to_join(Query *parse)
  *
  * This scans the rangetable for relation RTEs and retrieves the necessary
  * catalog information for each relation.  Using this information, it clears
- * the inh flag for any relation that has no children, and expands virtual
- * generated columns for any relation that contains them.
+ * the inh flag for any relation that has no children, collects not-null
+ * attribute numbers for any relation that has column not-null constraints, and
+ * expands virtual generated columns for any relation that contains them.
  *
  * Note that expanding virtual generated columns may cause the query tree to
  * have new copies of rangetable entries.  Therefore, we have to use list_nth
@@ -447,6 +449,13 @@ preprocess_relation_rtes(PlannerInfo *root)
 		if (rte->inh)
 			rte->inh = relation->rd_rel->relhassubclass;
 
+		/*
+		 * Check to see if the relation has any column not-null constraints;
+		 * if so, retrieve the constraint information and store it in a
+		 * relation OID based hash table.
+		 */
+		collect_relation_notnullatts(root, relation);
+
 		/*
 		 * Check to see if the relation has any virtual generated columns; if
 		 * so, replace all Var nodes in the query that reference these columns
@@ -1384,8 +1393,10 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 */
 	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 26a3e050086..5a64418488b 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -36,6 +36,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
 #include "parser/analyze.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_collate.h"
 #include "parser/parse_func.h"
 #include "parser/parse_oper.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
@@ -2242,7 +2244,8 @@ rowtype_field_matches(Oid rowtypeid, int fieldnum,
  * only operators and functions that are reasonable to try to execute.
  *
  * NOTE: "root" can be passed as NULL if the caller never wants to do any
- * Param substitutions nor receive info about inlined functions.
+ * Param substitutions nor receive info about inlined functions nor reduce
+ * NullTest for Vars to constant true or constant false.
  *
  * NOTE: the planner assumes that this will always flatten nested AND and
  * OR clauses into N-argument form.  See comments in prepqual.c.
@@ -3537,6 +3540,31 @@ eval_const_expressions_mutator(Node *node,
 
 					return makeBoolConst(result, false);
 				}
+				if (!ntest->argisrow && arg && IsA(arg, Var) && context->root)
+				{
+					Var		   *varg = (Var *) arg;
+					bool		result;
+
+					if (var_is_nonnullable(context->root, varg, false))
+					{
+						switch (ntest->nulltesttype)
+						{
+							case IS_NULL:
+								result = false;
+								break;
+							case IS_NOT_NULL:
+								result = true;
+								break;
+							default:
+								elog(ERROR, "unrecognized nulltesttype: %d",
+									 (int) ntest->nulltesttype);
+								result = false; /* keep compiler quiet */
+								break;
+						}
+
+						return makeBoolConst(result, false);
+					}
+				}
 
 				newntest = makeNode(NullTest);
 				newntest->arg = (Expr *) arg;
@@ -4155,6 +4183,67 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 	return newexpr;
 }
 
+/*
+ * var_is_nonnullable: check to see if the Var cannot be NULL
+ *
+ * If the Var is defined NOT NULL and meanwhile is not nulled by any outer
+ * joins or grouping sets, then we can know that it cannot be NULL.
+ *
+ * use_rel_info indicates whether the corresponding RelOptInfo is available for
+ * use.
+ */
+bool
+var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info)
+{
+	Relids		notnullattnums = NULL;
+
+	Assert(IsA(var, Var));
+
+	/* skip upper-level Vars */
+	if (var->varlevelsup != 0)
+		return false;
+
+	/* could the Var be nulled by any outer joins or grouping sets? */
+	if (!bms_is_empty(var->varnullingrels))
+		return false;
+
+	/* system columns cannot be NULL */
+	if (var->varattno < 0)
+		return true;
+
+	/*
+	 * Check if the Var is defined as NOT NULL.  We retrieve the column NOT
+	 * NULL constraint information from the corresponding RelOptInfo if it is
+	 * available; otherwise, we search the hash table for this information.
+	 */
+	if (use_rel_info)
+	{
+		RelOptInfo *rel = find_base_rel(root, var->varno);
+
+		notnullattnums = rel->notnullattnums;
+	}
+	else
+	{
+		RangeTblEntry *rte = planner_rt_fetch(var->varno, root);
+
+		/*
+		 * We must skip inheritance parent tables, as some child tables may
+		 * have a NOT NULL constraint for a column while others may not.  This
+		 * cannot happen with partitioned tables, though.
+		 */
+		if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
+			return false;
+
+		notnullattnums = get_relation_notnullatts(root, rte->relid);
+	}
+
+	if (var->varattno > 0 &&
+		bms_is_member(var->varattno, notnullattnums))
+		return true;
+
+	return false;
+}
+
 /*
  * expand_function_arguments: convert named-notation args to positional args
  * and/or insert default args, as needed
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 17e51cd75d7..4d4ef65d2a4 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -466,8 +466,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 								Index *childRTindex_p)
 {
 	Query	   *parse = root->parse;
-	Oid			parentOID PG_USED_FOR_ASSERTS_ONLY =
-		RelationGetRelid(parentrel);
+	Oid			parentOID = RelationGetRelid(parentrel);
 	Oid			childOID = RelationGetRelid(childrel);
 	RangeTblEntry *childrte;
 	Index		childRTindex;
@@ -513,6 +512,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	*childrte_p = childrte;
 	*childRTindex_p = childRTindex;
 
+	/*
+	 * Retrieve column not-null constraint information for the child relation
+	 * if its relation OID is different from the parent's.
+	 */
+	if (childOID != parentOID)
+		collect_relation_notnullatts(root, childrel);
+
 	/*
 	 * Build an AppendRelInfo struct for each parent/child pair.
 	 */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 59233b64730..4011cd5435b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -59,6 +59,12 @@ int			constraint_exclusion = CONSTRAINT_EXCLUSION_PARTITION;
 /* Hook for plugins to get control in get_relation_info() */
 get_relation_info_hook_type get_relation_info_hook = NULL;
 
+typedef struct NotnullHashEntry
+{
+	Oid			relid;			/* OID of the relation */
+	Relids		notnullattnums; /* attnums of NOT NULL columns */
+} NotnullHashEntry;
+
 
 static void get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 									  Relation relation, bool inhparent);
@@ -172,27 +178,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	 * RangeTblEntry does get populated.
 	 */
 	if (!inhparent || relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		for (int i = 0; i < relation->rd_att->natts; i++)
-		{
-			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
-
-			Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
-
-			if (attr->attnullability == ATTNULLABLE_VALID)
-			{
-				rel->notnullattnums = bms_add_member(rel->notnullattnums,
-													 i + 1);
-
-				/*
-				 * Per RemoveAttributeById(), dropped columns will have their
-				 * attnotnull unset, so we needn't check for dropped columns
-				 * in the above condition.
-				 */
-				Assert(!attr->attisdropped);
-			}
-		}
-	}
+		rel->notnullattnums = get_relation_notnullatts(root, relationObjectId);
 
 	/*
 	 * Estimate relation size --- unless it's an inheritance parent, in which
@@ -683,6 +669,105 @@ get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 	}
 }
 
+/*
+ * collect_relation_notnullatts -
+ *	  Retrieves column not-null constraint information for a given relation.
+ *
+ * We do this while we have the relcache entry open, and store the column
+ * not-null constraint information in a hash table based on the relation OID.
+ */
+void
+collect_relation_notnullatts(PlannerInfo *root, Relation relation)
+{
+	Oid			relid = RelationGetRelid(relation);
+	NotnullHashEntry *hentry;
+	bool		found;
+	Relids		notnullattnums = NULL;
+
+	/* bail out if the relation has no not-null constraints */
+	if (relation->rd_att->constr == NULL ||
+		!relation->rd_att->constr->has_not_null)
+		return;
+
+	/* create the hash table if it hasn't been created yet */
+	if (root->glob->rel_notnullatts_hash == NULL)
+	{
+		HTAB	   *hashtab;
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(Oid);
+		hash_ctl.entrysize = sizeof(NotnullHashEntry);
+		hash_ctl.hcxt = CurrentMemoryContext;
+
+		hashtab = hash_create("Relation NOT NULL attnums",
+							  64L,	/* arbitrary initial size */
+							  &hash_ctl,
+							  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		root->glob->rel_notnullatts_hash = hashtab;
+	}
+
+	/*
+	 * Create a hash entry for this relation OID, if we don't have one
+	 * already.
+	 */
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_ENTER,
+											  &found);
+
+	/* bail out if a hash entry already exists for this relation OID */
+	if (found)
+		return;
+
+	/* collect the column not-null constraint information for this relation */
+	for (int i = 0; i < relation->rd_att->natts; i++)
+	{
+		CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
+
+		Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
+
+		if (attr->attnullability == ATTNULLABLE_VALID)
+		{
+			notnullattnums = bms_add_member(notnullattnums, i + 1);
+
+			/*
+			 * Per RemoveAttributeById(), dropped columns will have their
+			 * attnotnull unset, so we needn't check for dropped columns in
+			 * the above condition.
+			 */
+			Assert(!attr->attisdropped);
+		}
+	}
+
+	/* ... and initialize the new hash entry */
+	hentry->notnullattnums = notnullattnums;
+}
+
+/*
+ * get_relation_notnullatts -
+ *	  Searches the hash table and returns the column not-null constraint
+ *	  information for a given relation.
+ */
+Relids
+get_relation_notnullatts(PlannerInfo *root, Oid relid)
+{
+	NotnullHashEntry *hentry;
+	bool		found;
+
+	if (root->glob->rel_notnullatts_hash == NULL)
+		return NULL;
+
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_FIND,
+											  &found);
+	if (!found)
+		return NULL;
+
+	return hentry->notnullattnums;
+}
+
 /*
  * infer_arbiter_indexes -
  *	  Determine the unique indexes used to arbitrate speculative insertion.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 011e5a811c3..023185d4902 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -182,6 +182,9 @@ typedef struct PlannerGlobal
 
 	/* partition descriptors */
 	PartitionDirectory partition_directory pg_node_attr(read_write_ignore);
+
+	/* hash table for NOT NULL attnums of relations */
+	struct HTAB *rel_notnullatts_hash pg_node_attr(read_write_ignore);
 } PlannerGlobal;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -722,6 +725,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *				the attribute is needed as part of final targetlist
  *		attr_widths - cache space for per-attribute width estimates;
  *					  zero means not computed yet
+ *		notnullattnums - zero-based set containing attnums of NOT NULL
+ *						 columns (not populated for rels corresponding to
+ *						 non-partitioned inh==true RTEs)
  *		nulling_relids - relids of outer joins that can null this rel
  *		lateral_vars - lateral cross-references of rel, if any (list of
  *					   Vars and PlaceHolderVars)
@@ -955,11 +961,7 @@ typedef struct RelOptInfo
 	Relids	   *attr_needed pg_node_attr(read_write_ignore);
 	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths pg_node_attr(read_write_ignore);
-
-	/*
-	 * Zero-based set containing attnums of NOT NULL columns.  Not populated
-	 * for rels corresponding to non-partitioned inh==true RTEs.
-	 */
+	/* zero-based set containing attnums of NOT NULL columns */
 	Bitmapset  *notnullattnums;
 	/* relids of outer joins that can null this baserel */
 	Relids		nulling_relids;
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 546828b54bd..37bc13c2cbd 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -154,6 +154,8 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 						   Oid result_collation);
 
+extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info);
+
 extern List *expand_function_arguments(List *args, bool include_out_arguments,
 									   Oid result_type,
 									   struct HeapTupleData *func_tuple);
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index cd74e4b1e8b..f22cdabbb3b 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -28,6 +28,10 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
 extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 							  bool inhparent, RelOptInfo *rel);
 
+extern void collect_relation_notnullatts(PlannerInfo *root, Relation relation);
+
+extern Relids get_relation_notnullatts(PlannerInfo *root, Oid relid);
+
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
 extern void estimate_rel_size(Relation rel, int32 *attr_widths,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index b766ccb1dc2..47c049998cd 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1531,11 +1531,11 @@ where coalesce(t2.b, 1) = 2;
 explain (costs off)
 select t1.a from gtest32 t1 left join gtest32 t2 on t1.a = t2.a
 where coalesce(t2.b, 1) = 2 or t1.a is null;
-                         QUERY PLAN                          
--------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Hash Left Join
    Hash Cond: (t1.a = t2.a)
-   Filter: ((COALESCE((t2.a * 2), 1) = 2) OR (t1.a IS NULL))
+   Filter: (COALESCE((t2.a * 2), 1) = 2)
    ->  Seq Scan on gtest32 t1
    ->  Hash
          ->  Seq Scan on gtest32 t2
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f35a0b18c37..632ae2afe7c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3639,8 +3639,8 @@ from nt3 as nt3
     ) as ss2
     on ss2.id = nt3.nt2_id
 where nt3.id = 1 and ss2.b3;
-                  QUERY PLAN                   
------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
@@ -3649,7 +3649,7 @@ where nt3.id = 1 and ss2.b3;
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
          Index Cond: (id = nt2.nt1_id)
-         Filter: (nt2.b1 AND (id IS NOT NULL))
+         Filter: (nt2.b1 AND true)
 (9 rows)
 
 select nt3.id
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index b79037748b7..59bfe33bb1c 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -84,10 +84,10 @@ SELECT * FROM pred_tab t WHERE t.a IS NULL OR t.c IS NULL;
 -- are provably false
 EXPLAIN (COSTS OFF)
 SELECT * FROM pred_tab t WHERE t.b IS NULL OR t.c IS NULL;
-               QUERY PLAN               
-----------------------------------------
+       QUERY PLAN       
+------------------------
  Seq Scan on pred_tab t
-   Filter: ((b IS NULL) OR (c IS NULL))
+   Filter: (b IS NULL)
 (2 rows)
 
 --
@@ -231,6 +231,54 @@ SELECT * FROM pred_tab t1
          ->  Seq Scan on pred_tab t3
 (9 rows)
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: EXISTS(SubPlan 1)
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+   SubPlan 1
+     ->  Nested Loop
+           ->  Nested Loop
+                 ->  Nested Loop
+                       ->  Seq Scan on pred_tab t4
+                       ->  Materialize
+                             ->  Seq Scan on pred_tab t3
+                                   Filter: (t1.a = a)
+                 ->  Materialize
+                       ->  Seq Scan on pred_tab t5
+           ->  Materialize
+                 ->  Seq Scan on pred_tab t6
+(17 rows)
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Left Join
+   Join Filter: (InitPlan 1).col1
+   InitPlan 1
+     ->  Result
+           One-Time Filter: false
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+(8 rows)
+
 DROP TABLE pred_tab;
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
 -- parents.
diff --git a/src/test/regress/sql/predicate.sql b/src/test/regress/sql/predicate.sql
index 9dcb81b1bc5..d92277353a0 100644
--- a/src/test/regress/sql/predicate.sql
+++ b/src/test/regress/sql/predicate.sql
@@ -115,6 +115,24 @@ SELECT * FROM pred_tab t1
     LEFT JOIN pred_tab t2 ON t1.a = 1
     LEFT JOIN pred_tab t3 ON t2.a IS NULL OR t2.c IS NULL;
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+
 DROP TABLE pred_tab;
 
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e5879e00dff..90cfa41cc3f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1753,6 +1753,7 @@ NonEmptyRange
 Notification
 NotificationList
 NotifyStmt
+NotnullHashEntry
 Nsrt
 NtDllRoutine
 NtFlushBuffersFileEx_t
-- 
2.43.0

#27

Chengpeng Yan

chengpeng_yan@Outlook.com

8 months ago

In reply to: Richard Guo (#26)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On May 1, 2025, at 16:33, Richard Guo <guofenglinux@gmail.com> wrote:

Here is the patchset that implements this optimization. 0001 moves
the expansion of virtual generated columns to occur before sublink
pull-up. 0002 introduces a new function, preprocess_relation_rtes,
which scans the rangetable for relation RTEs and performs inh flag
updates and virtual generated column expansion in a single loop, so
that only one table_open/table_close call is required for each
relation. 0003 collects NOT NULL attribute information for each
relation within the same loop, stores it in a relation OID based hash
table, and uses this information to reduce NullTest quals during
constant folding.

I think the code now more closely resembles the phase 1 and phase 2
described earlier: it collects all required early-stage catalog
information within a single loop over the rangetable, allowing each
relation to be opened and closed only once. It also avoids the
has_subclass() call along the way.

Thanks
Richard
<v4-0001-Expand-virtual-generated-columns-before-sublink-p.patch><v4-0002-Centralize-collection-of-catalog-info-needed-earl.patch><v4-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patch>

Hi,

I've been following the V4 patches (focusing on 1 and 2 for now): Patch 2's preprocess_relation_rtes is a nice improvement for efficiently gathering early catalog info like inh and attgenerated definitions in one pass.

However, Patch 1 needs to add expansion calls inside specific pull-up functions (like convert_EXISTS_sublink_to_join) because the main expansion work was moved before pull_up_sublinks.

Could we perhaps simplify this? What if preprocess_relation_rtes only collected the attgenerated definitions (storing them, maybe in a hashtable like planned for attnotnull in Patch 3), but didn't perform the actual expansion (Var replacement)?

Then, we could perform the actual expansion (Var replacement) in a separate, single, global step later on. Perhaps after pull_up_sublinks (closer to the original timing), or maybe even later still, for instance after flatten_simple_union_all, once the main query structure including pulled-up subqueries/links has stabilized? A unified expansion after the major structural changes seems cleaner. I'm not sure where is the better position now.

This might avoid the need for the extra expansion calls within convert_EXISTS_sublink_to_join, etc., keeping the information gathering separate from the expression transformation and potentially making the overall flow a bit cleaner.

Any thoughts?

Thanks,

Chengpeng Yan

#28

Richard Guo

guofenglinux@gmail.com

8 months ago

In reply to: Chengpeng Yan (#27)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Sat, May 3, 2025 at 7:48 PM Chengpeng Yan <chengpeng_yan@outlook.com> wrote:

I've been following the V4 patches (focusing on 1 and 2 for now): Patch 2's preprocess_relation_rtes is a nice improvement for efficiently gathering early catalog info like inh and attgenerated definitions in one pass.

However, Patch 1 needs to add expansion calls inside specific pull-up functions (like convert_EXISTS_sublink_to_join) because the main expansion work was moved before pull_up_sublinks.

Could we perhaps simplify this? What if preprocess_relation_rtes only collected the attgenerated definitions (storing them, maybe in a hashtable like planned for attnotnull in Patch 3), but didn't perform the actual expansion (Var replacement)?

Then, we could perform the actual expansion (Var replacement) in a separate, single, global step later on. Perhaps after pull_up_sublinks (closer to the original timing), or maybe even later still, for instance after flatten_simple_union_all, once the main query structure including pulled-up subqueries/links has stabilized? A unified expansion after the major structural changes seems cleaner. I'm not sure where is the better position now.

This might avoid the need for the extra expansion calls within convert_EXISTS_sublink_to_join, etc., keeping the information gathering separate from the expression transformation and potentially making the overall flow a bit cleaner.

Any thoughts?

This approach is possible, but I chose not to go that route because 1)
it would require an additional loop over the rangetable; 2) it would
involve collecting and storing in hash table a lot more information
that is only used during the expansion of virtual generated columns.
This includes not only the attgenerated attributes of columns you
mentioned, but also the default values of columns and the total number
of attributes in the tuple.

Therefore, it seems to me that expanding the virtual generated columns
within the same loop is cleaner and more efficient.

Please note that even if we move the expansion of virtual generated
columns into a separate loop, it still needs to occur before subquery
pull-up. This is because we must ensure that RTE_RELATION RTEs do not
have lateral markers. In other words, the expansion still needs to
take place within the subquery pull-up function.

Thanks
Richard

#29

Robert Haas

robertmhaas@gmail.com

8 months ago

In reply to: Richard Guo (#26)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Thu, May 1, 2025 at 4:33 AM Richard Guo <guofenglinux@gmail.com> wrote:

Here is the patchset that implements this optimization. 0001 moves
the expansion of virtual generated columns to occur before sublink
pull-up. 0002 introduces a new function, preprocess_relation_rtes,
which scans the rangetable for relation RTEs and performs inh flag
updates and virtual generated column expansion in a single loop, so
that only one table_open/table_close call is required for each
relation. 0003 collects NOT NULL attribute information for each
relation within the same loop, stores it in a relation OID based hash
table, and uses this information to reduce NullTest quals during
constant folding.

I think the code now more closely resembles the phase 1 and phase 2
described earlier: it collects all required early-stage catalog
information within a single loop over the rangetable, allowing each
relation to be opened and closed only once. It also avoids the
has_subclass() call along the way.

Before we commit to something along these lines, I think we need to
understand whether Tom intends to press Peter for some bigger change
around expand_virtual_generated_columns.

If Tom doesn't respond right away, I suggest that we need to add an
open item for /messages/by-id/602561.1744314879@sss.pgh.pa.us

--
Robert Haas
EDB: http://www.enterprisedb.com

#30

Tom Lane

tgl@sss.pgh.pa.us

8 months ago

In reply to: Robert Haas (#29)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Robert Haas <robertmhaas@gmail.com> writes:

Before we commit to something along these lines, I think we need to
understand whether Tom intends to press Peter for some bigger change
around expand_virtual_generated_columns.
If Tom doesn't respond right away, I suggest that we need to add an
open item for /messages/by-id/602561.1744314879@sss.pgh.pa.us

I think that we do need to do something about that, but it may be
in the too-late-for-v18 category by now. Not sure. I definitely
don't love the idea of table_open'ing every table in every query
an extra time just to find out (about 99.44% of the time) that it
does not have any virtual generated columns.

I wonder if a better answer would be to make the rewriter responsible
for this. If you hold your head at the correct angle, a table with
virtual generated columns looks a good deal like a view, and we don't
ask the planner to handle those.

BTW, in my mind the current thread is certainly v19 material,
so I have not looked at Richard's patch yet.

regards, tom lane

#31

Richard Guo

guofenglinux@gmail.com

8 months ago

In reply to: Tom Lane (#30)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Thu, May 22, 2025 at 11:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder if a better answer would be to make the rewriter responsible
for this. If you hold your head at the correct angle, a table with
virtual generated columns looks a good deal like a view, and we don't
ask the planner to handle those.

In Peter's initial commit (83ea6c540), it was the rewriter that was
responsible for expanding virtual generated columns. However, this
approach introduced several problems (see the reports starting from
[1]: /messages/by-id/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com
virtual columns with their corresponding generation expressions. To
preserve correctness, we may need to wrap those expressions in
PlaceHolderVars — for example, when the Vars come from the nullable
side of an outer join or are used in grouping sets.

So in commit 1e4351af3, Dean and I proposed moving the expansion of
virtual generated columns into the planner, so that we can insert
PlaceHolderVars when needed.

Yeah, the extra table_open call is annoying. In this patchset, we're
performing some additional tasks while the relation is open — such as
retrieving relhassubclass and attnotnull information. We also get rid
of the has_subclass() call along the way. Maybe this would help
justify the added cost?

[1]: /messages/by-id/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com

Thanks
Richard

#32

Richard Guo

guofenglinux@gmail.com

8 months ago

In reply to: Tom Lane (#30)

3 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Thu, May 22, 2025 at 11:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, in my mind the current thread is certainly v19 material,
so I have not looked at Richard's patch yet.

Yeah, this patchset is targeted for v19. Maybe we could be more
aggressive and have 0001 and 0002 in v18? (no chance for 0003 though)

This patchset does not apply anymore due to 2c0ed86d3. Here is a new
rebase.

Thanks
Richard

Attachments:

v5-0001-Expand-virtual-generated-columns-before-sublink-p.patchapplication/octet-stream; name=v5-0001-Expand-virtual-generated-columns-before-sublink-p.patchDownload

From 59f5a51626ac8767ad106fda4ae4e7be9b2e62ff Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 23 Apr 2025 10:29:15 +0900
Subject: [PATCH v5 1/3] Expand virtual generated columns before sublink
 pull-up

Currently, we expand virtual generated columns after we have pulled up
any SubLinks within the query's quals.  This ensures that the virtual
generated column references within SubLinks that should be transformed
into joins are correctly expanded.  This approach works well and has
posed no issues.

In an upcoming patch, we plan to centralize the collection of catalog
information needed early in the planner.  This will help avoid
repeated table_open/table_close calls for relations in the rangetable.
Since this information is required during sublink pull-up, we are
moving the expansion of virtual generated columns to occur beforehand.

To achieve this, if any EXISTS SubLinks can be pulled up, their
rangetables are processed just before pulling them up.
---
 src/backend/optimizer/plan/planner.c          | 17 +++++++-------
 src/backend/optimizer/plan/subselect.c        | 16 ++++++++++++++
 src/backend/optimizer/prep/prepjointree.c     | 20 +++++++----------
 src/include/optimizer/prep.h                  |  2 +-
 .../regress/expected/generated_virtual.out    | 22 +++++++++++++++++++
 src/test/regress/sql/generated_virtual.sql    |  9 ++++++++
 6 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ff65867eebe..d9c2ac374cf 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -720,6 +720,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	transform_MERGE_to_join(parse);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the query that reference these columns with
+	 * the generation expressions.  Note that this step does not descend into
+	 * sublinks and subqueries; if we pull up any sublinks or subqueries
+	 * below, their rangetables are processed just before pulling them up.
+	 */
+	parse = root->parse = expand_virtual_generated_columns(root);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -743,14 +752,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	preprocess_function_rtes(root);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Recursion issues here are handled in the
-	 * same way as for SubLinks.
-	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
-
 	/*
 	 * Check to see if any subqueries in the jointree can be merged into this
 	 * query.
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index e7cb3fede66..89e6873da08 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1458,6 +1458,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	int			varno;
 	Relids		clause_varnos;
 	Relids		upper_varnos;
+	PlannerInfo subroot;
 
 	Assert(sublink->subLinkType == EXISTS_SUBLINK);
 
@@ -1487,6 +1488,21 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	if (!simplify_EXISTS_query(root, subselect))
 		return NULL;
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 *
+	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
+	 * is fine because only the "glob" and "parse" links will be used in this
+	 * case.
+	 */
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	subselect = expand_virtual_generated_columns(&subroot);
+
 	/*
 	 * Separate out the WHERE clause.  (We could theoretically also remove
 	 * top-level plain JOIN/ON clauses, but it's probably not worth the
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 87dc6f56b57..8140d22de70 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,10 +4,10 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
+ *		expand_virtual_generated_columns
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
- *		expand_virtual_generated_columns
  *		pull_up_subqueries
  *		flatten_simple_union_all
  *		do expression preprocessing (including flattening JOIN alias vars)
@@ -958,10 +958,6 @@ preprocess_function_rtes(PlannerInfo *root)
  * generation expressions.  Note that we do not descend into subqueries; that
  * is taken care of when the subqueries are planned.
  *
- * This has to be done after we have pulled up any SubLinks within the query's
- * quals; otherwise any virtual generated column references within the SubLinks
- * that should be transformed into joins wouldn't get expanded.
- *
  * Returns a modified copy of the query tree, if any relations with virtual
  * generated columns are present.
  */
@@ -1333,6 +1329,13 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/* No CTEs to worry about */
 	Assert(subquery->cteList == NIL);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 */
+	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -1352,13 +1355,6 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	 */
 	preprocess_function_rtes(subroot);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.
-	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
-
 	/*
 	 * Recursively pull up the subquery's subqueries, so that
 	 * pull_up_subqueries' processing is complete for its jointree and
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index df56202777c..ceb731bcf5e 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,10 +22,10 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
+extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void pull_up_subqueries(PlannerInfo *root);
 extern void flatten_simple_union_all(PlannerInfo *root);
 extern void reduce_outer_joins(PlannerInfo *root);
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6300e7c1d96..b766ccb1dc2 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1591,4 +1591,26 @@ select * from gtest32 t group by grouping sets (a, b, c, d) having c = 20;
    |   | 20 |  
 (1 row)
 
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (t1.a > t2.a)
+   ->  Seq Scan on gtest32 t1
+   ->  Materialize
+         ->  Seq Scan on gtest32 t2
+               Filter: ((a * 2) = 2)
+(6 rows)
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+ ?column? 
+----------
+        1
+(1 row)
+
 drop table gtest32;
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index b4eedeee2fb..5dd68381e1c 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -832,4 +832,13 @@ explain (verbose, costs off)
 select * from gtest32 t group by grouping sets (a, b, c, d) having c = 20;
 select * from gtest32 t group by grouping sets (a, b, c, d) having c = 20;
 
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
 drop table gtest32;
-- 
2.43.0

v5-0002-Centralize-collection-of-catalog-info-needed-earl.patchapplication/octet-stream; name=v5-0002-Centralize-collection-of-catalog-info-needed-earl.patchDownload

From 0dafcb5e7c95526b96819a2610fe27593646fc07 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 24 Apr 2025 14:58:03 +0900
Subject: [PATCH v5 2/3] Centralize collection of catalog info needed early in
 the planner

There are several pieces of catalog information that need to be
retrieved for a relation during the early stage of planning.  These
include relhassubclass, which is used to clear the inh flag if the
relation has no children, as well as a column's attgenerated and
default value, which are needed to expand virtual generated columns.
More such information may be required in the future.

Currently, these pieces of catalog data are collected in multiple
places, resulting in repeated table_open/table_close calls for each
relation in the rangetable.  This patch centralizes the collection of
all required early-stage catalog information into a single loop over
the rangetable, allowing each relation to be opened and closed only
once.
---
 src/backend/optimizer/plan/planner.c      |  31 +--
 src/backend/optimizer/plan/subselect.c    |   9 +-
 src/backend/optimizer/prep/prepjointree.c | 299 +++++++++++++---------
 src/include/optimizer/prep.h              |   2 +-
 4 files changed, 190 insertions(+), 151 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d9c2ac374cf..e6d16b799b5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -721,13 +721,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	transform_MERGE_to_join(parse);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Note that this step does not descend into
-	 * sublinks and subqueries; if we pull up any sublinks or subqueries
-	 * below, their rangetables are processed just before pulling them up.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.  Note that this
+	 * step does not descend into sublinks and subqueries; if we pull up any
+	 * sublinks or subqueries below, their relation RTEs are processed just
+	 * before pulling them up.
 	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
+	parse = root->parse = preprocess_relation_rtes(root);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
@@ -788,23 +790,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 
 		switch (rte->rtekind)
 		{
-			case RTE_RELATION:
-				if (rte->inh)
-				{
-					/*
-					 * Check to see if the relation actually has any children;
-					 * if not, clear the inh flag so we can treat it as a
-					 * plain base relation.
-					 *
-					 * Note: this could give a false-positive result, if the
-					 * rel once had children but no longer does.  We used to
-					 * be able to clear rte->inh later on when we discovered
-					 * that, but no more; we have to handle such cases as
-					 * full-fledged inheritance.
-					 */
-					rte->inh = has_subclass(rte->relid);
-				}
-				break;
 			case RTE_JOIN:
 				root->hasJoinRTEs = true;
 				if (IS_OUTER_JOIN(rte->jointype))
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 89e6873da08..65fc3f49d39 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1489,9 +1489,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 		return NULL;
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1501,7 +1502,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	subroot.type = T_PlannerInfo;
 	subroot.glob = root->glob;
 	subroot.parse = subselect;
-	subselect = expand_virtual_generated_columns(&subroot);
+	subselect = preprocess_relation_rtes(&subroot);
 
 	/*
 	 * Separate out the WHERE clause.  (We could theoretically also remove
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 8140d22de70..4b38851bd42 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,7 +4,7 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
- *		expand_virtual_generated_columns
+ *		preprocess_relation_rtes
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
@@ -102,6 +102,9 @@ typedef struct reduce_outer_joins_partial_state
 	Relids		unreduced_side; /* relids in its still-nullable side */
 } reduce_outer_joins_partial_state;
 
+static Query *expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+											   RangeTblEntry *rte, int rt_index,
+											   Relation relation);
 static Node *pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
 											   Relids *relids);
 static Node *pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
@@ -392,6 +395,173 @@ transform_MERGE_to_join(Query *parse)
 		parse->mergeJoinCondition = NULL;	/* join condition not needed */
 }
 
+/*
+ * preprocess_relation_rtes
+ *		Do the preprocessing work for any relation RTEs in the FROM clause.
+ *
+ * This scans the rangetable for relation RTEs and retrieves the necessary
+ * catalog information for each relation.  Using this information, it clears
+ * the inh flag for any relation that has no children, and expands virtual
+ * generated columns for any relation that contains them.
+ *
+ * Note that expanding virtual generated columns may cause the query tree to
+ * have new copies of rangetable entries.  Therefore, we have to use list_nth
+ * instead of foreach when iterating over the query's rangetable.
+ *
+ * Returns a modified copy of the query tree, if any relations with virtual
+ * generated columns are present.
+ */
+Query *
+preprocess_relation_rtes(PlannerInfo *root)
+{
+	Query	   *parse = root->parse;
+	int			rtable_size;
+	int			rt_index;
+
+	rtable_size = list_length(parse->rtable);
+
+	for (rt_index = 0; rt_index < rtable_size; rt_index++)
+	{
+		RangeTblEntry *rte = rt_fetch(rt_index + 1, parse->rtable);
+		Relation	relation;
+
+		/* We only care about relation RTEs. */
+		if (rte->rtekind != RTE_RELATION)
+			continue;
+
+		/*
+		 * We need not lock the relation since it was already locked by the
+		 * rewriter.
+		 */
+		relation = table_open(rte->relid, NoLock);
+
+		/*
+		 * Check to see if the relation actually has any children; if not,
+		 * clear the inh flag so we can treat it as a plain base relation.
+		 *
+		 * Note: this could give a false-positive result, if the rel once had
+		 * children but no longer does.  We used to be able to clear rte->inh
+		 * later on when we discovered that, but no more; we have to handle
+		 * such cases as full-fledged inheritance.
+		 */
+		if (rte->inh)
+			rte->inh = relation->rd_rel->relhassubclass;
+
+		/*
+		 * Check to see if the relation has any virtual generated columns; if
+		 * so, replace all Var nodes in the query that reference these columns
+		 * with the generation expressions.
+		 */
+		parse = expand_virtual_generated_columns(root, parse,
+												 rte, rt_index + 1,
+												 relation);
+
+		table_close(relation, NoLock);
+	}
+
+	return parse;
+}
+
+/*
+ * expand_virtual_generated_columns
+ *		Expand virtual generated columns for the given relation.
+ *
+ * This checks whether the given relation has any virtual generated columns,
+ * and if so, replaces all Var nodes in the query that reference those columns
+ * with their generation expressions.
+ *
+ * Returns a modified copy of the query tree if the relation contains virtual
+ * generated columns.
+ */
+static Query *
+expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+								 RangeTblEntry *rte, int rt_index,
+								 Relation relation)
+{
+	TupleDesc	tupdesc;
+
+	/* Only normal relations can have virtual generated columns */
+	Assert(rte->rtekind == RTE_RELATION);
+
+	tupdesc = RelationGetDescr(relation);
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		List	   *tlist = NIL;
+		pullup_replace_vars_context rvcontext;
+
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+			TargetEntry *tle;
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				Node	   *defexpr;
+
+				defexpr = build_generation_expression(relation, i + 1);
+				ChangeVarNodes(defexpr, 1, rt_index, 0);
+
+				tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+			else
+			{
+				Var		   *var;
+
+				var = makeVar(rt_index,
+							  i + 1,
+							  attr->atttypid,
+							  attr->atttypmod,
+							  attr->attcollation,
+							  0);
+
+				tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+		}
+
+		Assert(list_length(tlist) > 0);
+		Assert(!rte->lateral);
+
+		/*
+		 * The relation's targetlist items are now in the appropriate form to
+		 * insert into the query, except that we may need to wrap them in
+		 * PlaceHolderVars.  Set up required context data for
+		 * pullup_replace_vars.
+		 */
+		rvcontext.root = root;
+		rvcontext.targetlist = tlist;
+		rvcontext.target_rte = rte;
+		rvcontext.result_relation = parse->resultRelation;
+		/* won't need these values */
+		rvcontext.relids = NULL;
+		rvcontext.nullinfo = NULL;
+		/* pass NULL for outer_hasSubLinks */
+		rvcontext.outer_hasSubLinks = NULL;
+		rvcontext.varno = rt_index;
+		/* this flag will be set below, if needed */
+		rvcontext.wrap_option = REPLACE_WRAP_NONE;
+		/* initialize cache array with indexes 0 .. length(tlist) */
+		rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
+									 sizeof(Node *));
+
+		/*
+		 * If the query uses grouping sets, we need a PlaceHolderVar for each
+		 * expression of the relation's targetlist items.  (See comments in
+		 * pull_up_simple_subquery().)
+		 */
+		if (parse->groupingSets)
+			rvcontext.wrap_option = REPLACE_WRAP_ALL;
+
+		/*
+		 * Apply pullup variable replacement throughout the query tree.
+		 */
+		parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
+	}
+
+	return parse;
+}
+
 /*
  * replace_empty_jointree
  *		If the Query's jointree is empty, replace it with a dummy RTE_RESULT
@@ -949,124 +1119,6 @@ preprocess_function_rtes(PlannerInfo *root)
 	}
 }
 
-/*
- * expand_virtual_generated_columns
- *		Expand all virtual generated column references in a query.
- *
- * This scans the rangetable for relations with virtual generated columns, and
- * replaces all Var nodes in the query that reference these columns with the
- * generation expressions.  Note that we do not descend into subqueries; that
- * is taken care of when the subqueries are planned.
- *
- * Returns a modified copy of the query tree, if any relations with virtual
- * generated columns are present.
- */
-Query *
-expand_virtual_generated_columns(PlannerInfo *root)
-{
-	Query	   *parse = root->parse;
-	int			rt_index;
-	ListCell   *lc;
-
-	rt_index = 0;
-	foreach(lc, parse->rtable)
-	{
-		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
-		Relation	rel;
-		TupleDesc	tupdesc;
-
-		++rt_index;
-
-		/*
-		 * Only normal relations can have virtual generated columns.
-		 */
-		if (rte->rtekind != RTE_RELATION)
-			continue;
-
-		rel = table_open(rte->relid, NoLock);
-
-		tupdesc = RelationGetDescr(rel);
-		if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
-		{
-			List	   *tlist = NIL;
-			pullup_replace_vars_context rvcontext;
-
-			for (int i = 0; i < tupdesc->natts; i++)
-			{
-				Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
-				TargetEntry *tle;
-
-				if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-				{
-					Node	   *defexpr;
-
-					defexpr = build_generation_expression(rel, i + 1);
-					ChangeVarNodes(defexpr, 1, rt_index, 0);
-
-					tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-				else
-				{
-					Var		   *var;
-
-					var = makeVar(rt_index,
-								  i + 1,
-								  attr->atttypid,
-								  attr->atttypmod,
-								  attr->attcollation,
-								  0);
-
-					tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-			}
-
-			Assert(list_length(tlist) > 0);
-			Assert(!rte->lateral);
-
-			/*
-			 * The relation's targetlist items are now in the appropriate form
-			 * to insert into the query, except that we may need to wrap them
-			 * in PlaceHolderVars.  Set up required context data for
-			 * pullup_replace_vars.
-			 */
-			rvcontext.root = root;
-			rvcontext.targetlist = tlist;
-			rvcontext.target_rte = rte;
-			rvcontext.result_relation = parse->resultRelation;
-			/* won't need these values */
-			rvcontext.relids = NULL;
-			rvcontext.nullinfo = NULL;
-			/* pass NULL for outer_hasSubLinks */
-			rvcontext.outer_hasSubLinks = NULL;
-			rvcontext.varno = rt_index;
-			/* this flag will be set below, if needed */
-			rvcontext.wrap_option = REPLACE_WRAP_NONE;
-			/* initialize cache array with indexes 0 .. length(tlist) */
-			rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
-										 sizeof(Node *));
-
-			/*
-			 * If the query uses grouping sets, we need a PlaceHolderVar for
-			 * each expression of the relation's targetlist items.  (See
-			 * comments in pull_up_simple_subquery().)
-			 */
-			if (parse->groupingSets)
-				rvcontext.wrap_option = REPLACE_WRAP_ALL;
-
-			/*
-			 * Apply pullup variable replacement throughout the query tree.
-			 */
-			parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
-		}
-
-		table_close(rel, NoLock);
-	}
-
-	return parse;
-}
-
 /*
  * pull_up_subqueries
  *		Look for subqueries in the rangetable that can be pulled up into
@@ -1330,11 +1382,12 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	Assert(subquery->cteList == NIL);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index ceb731bcf5e..4fbecdb4462 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,7 +22,7 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
+extern Query *preprocess_relation_rtes(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-- 
2.43.0

v5-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchapplication/octet-stream; name=v5-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchDownload

From 2b25cef8d9f28155110a0747f0bb8bae606a00bb Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 30 Apr 2025 18:50:37 +0900
Subject: [PATCH v5 3/3] Reduce "Var IS [NOT] NULL" quals during constant
 folding

In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets.  This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner.  However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.

Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point.  This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.

This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks, storing it in a hash table keyed by
relation OID.  It then uses this information to perform the NullTest
deduction for Vars during constant folding.  This also makes it
possible to leverage this information to pull up NOT IN subqueries.

Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false.  Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   8 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   2 +-
 src/backend/optimizer/plan/initsplan.c        |  24 +---
 src/backend/optimizer/plan/planner.c          |  12 +-
 src/backend/optimizer/plan/subselect.c        |  20 ++-
 src/backend/optimizer/prep/prepjointree.c     |  19 ++-
 src/backend/optimizer/util/clauses.c          |  92 ++++++++++++-
 src/backend/optimizer/util/inherit.c          |  10 +-
 src/backend/optimizer/util/plancat.c          | 127 +++++++++++++++---
 src/include/nodes/pathnodes.h                 |  12 +-
 src/include/optimizer/optimizer.h             |   2 +
 src/include/optimizer/plancat.h               |   4 +
 .../regress/expected/generated_virtual.out    |   6 +-
 src/test/regress/expected/join.out            |   6 +-
 src/test/regress/expected/predicate.out       |  54 +++++++-
 src/test/regress/sql/predicate.sql            |  18 +++
 src/tools/pgindent/typedefs.list              |   1 +
 17 files changed, 336 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2185b42bb4f..e3621a47bf1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -710,12 +710,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- Op
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
-                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
+                                                              QUERY PLAN                                                              
+--------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e534b40de3c..036c8749914 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -352,7 +352,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;        -- Nu
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- SubscriptingRef
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01804b085b3..3e3fec89252 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -3048,36 +3048,16 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
  * expr_is_nonnullable
  *	  Check to see if the Expr cannot be NULL
  *
- * If the Expr is a simple Var that is defined NOT NULL and meanwhile is not
- * nulled by any outer joins, then we can know that it cannot be NULL.
+ * Currently we only support simple Vars.
  */
 static bool
 expr_is_nonnullable(PlannerInfo *root, Expr *expr)
 {
-	RelOptInfo *rel;
-	Var		   *var;
-
 	/* For now only check simple Vars */
 	if (!IsA(expr, Var))
 		return false;
 
-	var = (Var *) expr;
-
-	/* could the Var be nulled by any outer joins? */
-	if (!bms_is_empty(var->varnullingrels))
-		return false;
-
-	/* system columns cannot be NULL */
-	if (var->varattno < 0)
-		return true;
-
-	/* is the column defined NOT NULL? */
-	rel = find_base_rel(root, var->varno);
-	if (var->varattno > 0 &&
-		bms_is_member(var->varattno, rel->notnullattnums))
-		return true;
-
-	return false;
+	return var_is_nonnullable(root, (Var *) expr, true);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e6d16b799b5..66f1f416364 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,6 +342,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	glob->transientPlan = false;
 	glob->dependsOnRole = false;
 	glob->partition_directory = NULL;
+	glob->rel_notnullatts_hash = NULL;
 
 	/*
 	 * Assess whether it's feasible to use parallel mode for this query. We
@@ -723,11 +724,12 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.  Note that this
-	 * step does not descend into sublinks and subqueries; if we pull up any
-	 * sublinks or subqueries below, their relation RTEs are processed just
-	 * before pulling them up.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.  Note that this step does not descend into sublinks and
+	 * subqueries; if we pull up any sublinks or subqueries below, their
+	 * relation RTEs are processed just before pulling them up.
 	 */
 	parse = root->parse = preprocess_relation_rtes(root);
 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 65fc3f49d39..8ea20061594 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1491,8 +1491,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1749,6 +1751,7 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 					  Node **testexpr, List **paramIds)
 {
 	Node	   *whereClause;
+	PlannerInfo subroot;
 	List	   *leftargs,
 			   *rightargs,
 			   *opids,
@@ -1808,12 +1811,15 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 	 * parent aliases were flattened already, and we're not going to pull any
 	 * child Vars (of any description) into the parent.
 	 *
-	 * Note: passing the parent's root to eval_const_expressions is
-	 * technically wrong, but we can get away with it since only the
-	 * boundParams (if any) are used, and those would be the same in a
-	 * subroot.
+	 * Note: we construct up an entirely dummy PlannerInfo to pass to
+	 * eval_const_expressions.  This is fine because only the "glob" and
+	 * "parse" links are used by eval_const_expressions.
 	 */
-	whereClause = eval_const_expressions(root, whereClause);
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	whereClause = eval_const_expressions(&subroot, whereClause);
 	whereClause = (Node *) canonicalize_qual((Expr *) whereClause, false);
 	whereClause = (Node *) make_ands_implicit((Expr *) whereClause);
 
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 4b38851bd42..35e8d3c183b 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -36,6 +36,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
 #include "optimizer/tlist.h"
@@ -401,8 +402,9 @@ transform_MERGE_to_join(Query *parse)
  *
  * This scans the rangetable for relation RTEs and retrieves the necessary
  * catalog information for each relation.  Using this information, it clears
- * the inh flag for any relation that has no children, and expands virtual
- * generated columns for any relation that contains them.
+ * the inh flag for any relation that has no children, collects not-null
+ * attribute numbers for any relation that has column not-null constraints, and
+ * expands virtual generated columns for any relation that contains them.
  *
  * Note that expanding virtual generated columns may cause the query tree to
  * have new copies of rangetable entries.  Therefore, we have to use list_nth
@@ -447,6 +449,13 @@ preprocess_relation_rtes(PlannerInfo *root)
 		if (rte->inh)
 			rte->inh = relation->rd_rel->relhassubclass;
 
+		/*
+		 * Check to see if the relation has any column not-null constraints;
+		 * if so, retrieve the constraint information and store it in a
+		 * relation OID based hash table.
+		 */
+		get_relation_notnullatts(root, relation);
+
 		/*
 		 * Check to see if the relation has any virtual generated columns; if
 		 * so, replace all Var nodes in the query that reference these columns
@@ -1384,8 +1393,10 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 */
 	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 26a3e050086..e83f4b061d7 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -36,6 +37,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
 #include "parser/analyze.h"
@@ -43,6 +45,7 @@
 #include "parser/parse_collate.h"
 #include "parser/parse_func.h"
 #include "parser/parse_oper.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
@@ -2242,7 +2245,8 @@ rowtype_field_matches(Oid rowtypeid, int fieldnum,
  * only operators and functions that are reasonable to try to execute.
  *
  * NOTE: "root" can be passed as NULL if the caller never wants to do any
- * Param substitutions nor receive info about inlined functions.
+ * Param substitutions nor receive info about inlined functions nor reduce
+ * NullTest for Vars to constant true or constant false.
  *
  * NOTE: the planner assumes that this will always flatten nested AND and
  * OR clauses into N-argument form.  See comments in prepqual.c.
@@ -3537,6 +3541,31 @@ eval_const_expressions_mutator(Node *node,
 
 					return makeBoolConst(result, false);
 				}
+				if (!ntest->argisrow && arg && IsA(arg, Var) && context->root)
+				{
+					Var		   *varg = (Var *) arg;
+					bool		result;
+
+					if (var_is_nonnullable(context->root, varg, false))
+					{
+						switch (ntest->nulltesttype)
+						{
+							case IS_NULL:
+								result = false;
+								break;
+							case IS_NOT_NULL:
+								result = true;
+								break;
+							default:
+								elog(ERROR, "unrecognized nulltesttype: %d",
+									 (int) ntest->nulltesttype);
+								result = false; /* keep compiler quiet */
+								break;
+						}
+
+						return makeBoolConst(result, false);
+					}
+				}
 
 				newntest = makeNode(NullTest);
 				newntest->arg = (Expr *) arg;
@@ -4155,6 +4184,67 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 	return newexpr;
 }
 
+/*
+ * var_is_nonnullable: check to see if the Var cannot be NULL
+ *
+ * If the Var is defined NOT NULL and meanwhile is not nulled by any outer
+ * joins or grouping sets, then we can know that it cannot be NULL.
+ *
+ * use_rel_info indicates whether the corresponding RelOptInfo is available for
+ * use.
+ */
+bool
+var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info)
+{
+	Relids		notnullattnums = NULL;
+
+	Assert(IsA(var, Var));
+
+	/* skip upper-level Vars */
+	if (var->varlevelsup != 0)
+		return false;
+
+	/* could the Var be nulled by any outer joins or grouping sets? */
+	if (!bms_is_empty(var->varnullingrels))
+		return false;
+
+	/* system columns cannot be NULL */
+	if (var->varattno < 0)
+		return true;
+
+	/*
+	 * Check if the Var is defined as NOT NULL.  We retrieve the column NOT
+	 * NULL constraint information from the corresponding RelOptInfo if it is
+	 * available; otherwise, we search the hash table for this information.
+	 */
+	if (use_rel_info)
+	{
+		RelOptInfo *rel = find_base_rel(root, var->varno);
+
+		notnullattnums = rel->notnullattnums;
+	}
+	else
+	{
+		RangeTblEntry *rte = planner_rt_fetch(var->varno, root);
+
+		/*
+		 * We must skip inheritance parent tables, as some child tables may
+		 * have a NOT NULL constraint for a column while others may not.  This
+		 * cannot happen with partitioned tables, though.
+		 */
+		if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
+			return false;
+
+		notnullattnums = find_relation_notnullatts(root, rte->relid);
+	}
+
+	if (var->varattno > 0 &&
+		bms_is_member(var->varattno, notnullattnums))
+		return true;
+
+	return false;
+}
+
 /*
  * expand_function_arguments: convert named-notation args to positional args
  * and/or insert default args, as needed
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 17e51cd75d7..30d158069e3 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -466,8 +466,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 								Index *childRTindex_p)
 {
 	Query	   *parse = root->parse;
-	Oid			parentOID PG_USED_FOR_ASSERTS_ONLY =
-		RelationGetRelid(parentrel);
+	Oid			parentOID = RelationGetRelid(parentrel);
 	Oid			childOID = RelationGetRelid(childrel);
 	RangeTblEntry *childrte;
 	Index		childRTindex;
@@ -513,6 +512,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	*childrte_p = childrte;
 	*childRTindex_p = childRTindex;
 
+	/*
+	 * Retrieve column not-null constraint information for the child relation
+	 * if its relation OID is different from the parent's.
+	 */
+	if (childOID != parentOID)
+		get_relation_notnullatts(root, childrel);
+
 	/*
 	 * Build an AppendRelInfo struct for each parent/child pair.
 	 */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 59233b64730..c6a58afc5e5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -59,6 +59,12 @@ int			constraint_exclusion = CONSTRAINT_EXCLUSION_PARTITION;
 /* Hook for plugins to get control in get_relation_info() */
 get_relation_info_hook_type get_relation_info_hook = NULL;
 
+typedef struct NotnullHashEntry
+{
+	Oid			relid;			/* OID of the relation */
+	Relids		notnullattnums; /* attnums of NOT NULL columns */
+} NotnullHashEntry;
+
 
 static void get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 									  Relation relation, bool inhparent);
@@ -172,27 +178,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	 * RangeTblEntry does get populated.
 	 */
 	if (!inhparent || relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		for (int i = 0; i < relation->rd_att->natts; i++)
-		{
-			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
-
-			Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
-
-			if (attr->attnullability == ATTNULLABLE_VALID)
-			{
-				rel->notnullattnums = bms_add_member(rel->notnullattnums,
-													 i + 1);
-
-				/*
-				 * Per RemoveAttributeById(), dropped columns will have their
-				 * attnotnull unset, so we needn't check for dropped columns
-				 * in the above condition.
-				 */
-				Assert(!attr->attisdropped);
-			}
-		}
-	}
+		rel->notnullattnums = find_relation_notnullatts(root, relationObjectId);
 
 	/*
 	 * Estimate relation size --- unless it's an inheritance parent, in which
@@ -683,6 +669,105 @@ get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 	}
 }
 
+/*
+ * get_relation_notnullatts -
+ *	  Retrieves column not-null constraint information for a given relation.
+ *
+ * We do this while we have the relcache entry open, and store the column
+ * not-null constraint information in a hash table based on the relation OID.
+ */
+void
+get_relation_notnullatts(PlannerInfo *root, Relation relation)
+{
+	Oid			relid = RelationGetRelid(relation);
+	NotnullHashEntry *hentry;
+	bool		found;
+	Relids		notnullattnums = NULL;
+
+	/* bail out if the relation has no not-null constraints */
+	if (relation->rd_att->constr == NULL ||
+		!relation->rd_att->constr->has_not_null)
+		return;
+
+	/* create the hash table if it hasn't been created yet */
+	if (root->glob->rel_notnullatts_hash == NULL)
+	{
+		HTAB	   *hashtab;
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(Oid);
+		hash_ctl.entrysize = sizeof(NotnullHashEntry);
+		hash_ctl.hcxt = CurrentMemoryContext;
+
+		hashtab = hash_create("Relation NOT NULL attnums",
+							  64L,	/* arbitrary initial size */
+							  &hash_ctl,
+							  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		root->glob->rel_notnullatts_hash = hashtab;
+	}
+
+	/*
+	 * Create a hash entry for this relation OID, if we don't have one
+	 * already.
+	 */
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_ENTER,
+											  &found);
+
+	/* bail out if a hash entry already exists for this relation OID */
+	if (found)
+		return;
+
+	/* collect the column not-null constraint information for this relation */
+	for (int i = 0; i < relation->rd_att->natts; i++)
+	{
+		CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
+
+		Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
+
+		if (attr->attnullability == ATTNULLABLE_VALID)
+		{
+			notnullattnums = bms_add_member(notnullattnums, i + 1);
+
+			/*
+			 * Per RemoveAttributeById(), dropped columns will have their
+			 * attnotnull unset, so we needn't check for dropped columns in
+			 * the above condition.
+			 */
+			Assert(!attr->attisdropped);
+		}
+	}
+
+	/* ... and initialize the new hash entry */
+	hentry->notnullattnums = notnullattnums;
+}
+
+/*
+ * find_relation_notnullatts -
+ *	  Searches the hash table and returns the column not-null constraint
+ *	  information for a given relation.
+ */
+Relids
+find_relation_notnullatts(PlannerInfo *root, Oid relid)
+{
+	NotnullHashEntry *hentry;
+	bool		found;
+
+	if (root->glob->rel_notnullatts_hash == NULL)
+		return NULL;
+
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_FIND,
+											  &found);
+	if (!found)
+		return NULL;
+
+	return hentry->notnullattnums;
+}
+
 /*
  * infer_arbiter_indexes -
  *	  Determine the unique indexes used to arbitrate speculative insertion.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..e5dd15098f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -179,6 +179,9 @@ typedef struct PlannerGlobal
 
 	/* partition descriptors */
 	PartitionDirectory partition_directory pg_node_attr(read_write_ignore);
+
+	/* hash table for NOT NULL attnums of relations */
+	struct HTAB *rel_notnullatts_hash pg_node_attr(read_write_ignore);
 } PlannerGlobal;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -719,6 +722,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *				the attribute is needed as part of final targetlist
  *		attr_widths - cache space for per-attribute width estimates;
  *					  zero means not computed yet
+ *		notnullattnums - zero-based set containing attnums of NOT NULL
+ *						 columns (not populated for rels corresponding to
+ *						 non-partitioned inh==true RTEs)
  *		nulling_relids - relids of outer joins that can null this rel
  *		lateral_vars - lateral cross-references of rel, if any (list of
  *					   Vars and PlaceHolderVars)
@@ -952,11 +958,7 @@ typedef struct RelOptInfo
 	Relids	   *attr_needed pg_node_attr(read_write_ignore);
 	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths pg_node_attr(read_write_ignore);
-
-	/*
-	 * Zero-based set containing attnums of NOT NULL columns.  Not populated
-	 * for rels corresponding to non-partitioned inh==true RTEs.
-	 */
+	/* zero-based set containing attnums of NOT NULL columns */
 	Bitmapset  *notnullattnums;
 	/* relids of outer joins that can null this baserel */
 	Relids		nulling_relids;
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 546828b54bd..37bc13c2cbd 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -154,6 +154,8 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 						   Oid result_collation);
 
+extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info);
+
 extern List *expand_function_arguments(List *args, bool include_out_arguments,
 									   Oid result_type,
 									   struct HeapTupleData *func_tuple);
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index cd74e4b1e8b..d6f6f4ad2d7 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -28,6 +28,10 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
 extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 							  bool inhparent, RelOptInfo *rel);
 
+extern void get_relation_notnullatts(PlannerInfo *root, Relation relation);
+
+extern Relids find_relation_notnullatts(PlannerInfo *root, Oid relid);
+
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
 extern void estimate_rel_size(Relation rel, int32 *attr_widths,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index b766ccb1dc2..47c049998cd 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1531,11 +1531,11 @@ where coalesce(t2.b, 1) = 2;
 explain (costs off)
 select t1.a from gtest32 t1 left join gtest32 t2 on t1.a = t2.a
 where coalesce(t2.b, 1) = 2 or t1.a is null;
-                         QUERY PLAN                          
--------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Hash Left Join
    Hash Cond: (t1.a = t2.a)
-   Filter: ((COALESCE((t2.a * 2), 1) = 2) OR (t1.a IS NULL))
+   Filter: (COALESCE((t2.a * 2), 1) = 2)
    ->  Seq Scan on gtest32 t1
    ->  Hash
          ->  Seq Scan on gtest32 t2
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f35a0b18c37..632ae2afe7c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3639,8 +3639,8 @@ from nt3 as nt3
     ) as ss2
     on ss2.id = nt3.nt2_id
 where nt3.id = 1 and ss2.b3;
-                  QUERY PLAN                   
------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
@@ -3649,7 +3649,7 @@ where nt3.id = 1 and ss2.b3;
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
          Index Cond: (id = nt2.nt1_id)
-         Filter: (nt2.b1 AND (id IS NOT NULL))
+         Filter: (nt2.b1 AND true)
 (9 rows)
 
 select nt3.id
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index b79037748b7..59bfe33bb1c 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -84,10 +84,10 @@ SELECT * FROM pred_tab t WHERE t.a IS NULL OR t.c IS NULL;
 -- are provably false
 EXPLAIN (COSTS OFF)
 SELECT * FROM pred_tab t WHERE t.b IS NULL OR t.c IS NULL;
-               QUERY PLAN               
-----------------------------------------
+       QUERY PLAN       
+------------------------
  Seq Scan on pred_tab t
-   Filter: ((b IS NULL) OR (c IS NULL))
+   Filter: (b IS NULL)
 (2 rows)
 
 --
@@ -231,6 +231,54 @@ SELECT * FROM pred_tab t1
          ->  Seq Scan on pred_tab t3
 (9 rows)
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: EXISTS(SubPlan 1)
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+   SubPlan 1
+     ->  Nested Loop
+           ->  Nested Loop
+                 ->  Nested Loop
+                       ->  Seq Scan on pred_tab t4
+                       ->  Materialize
+                             ->  Seq Scan on pred_tab t3
+                                   Filter: (t1.a = a)
+                 ->  Materialize
+                       ->  Seq Scan on pred_tab t5
+           ->  Materialize
+                 ->  Seq Scan on pred_tab t6
+(17 rows)
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Left Join
+   Join Filter: (InitPlan 1).col1
+   InitPlan 1
+     ->  Result
+           One-Time Filter: false
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+(8 rows)
+
 DROP TABLE pred_tab;
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
 -- parents.
diff --git a/src/test/regress/sql/predicate.sql b/src/test/regress/sql/predicate.sql
index 9dcb81b1bc5..d92277353a0 100644
--- a/src/test/regress/sql/predicate.sql
+++ b/src/test/regress/sql/predicate.sql
@@ -115,6 +115,24 @@ SELECT * FROM pred_tab t1
     LEFT JOIN pred_tab t2 ON t1.a = 1
     LEFT JOIN pred_tab t3 ON t2.a IS NULL OR t2.c IS NULL;
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+
 DROP TABLE pred_tab;
 
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8346cda633..ac36a8888a3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1750,6 +1750,7 @@ NonEmptyRange
 Notification
 NotificationList
 NotifyStmt
+NotnullHashEntry
 Nsrt
 NtDllRoutine
 NtFlushBuffersFileEx_t
-- 
2.43.0

#33

Richard Guo

guofenglinux@gmail.com

7 months ago

In reply to: Richard Guo (#32)

3 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:

Yeah, this patchset is targeted for v19. Maybe we could be more
aggressive and have 0001 and 0002 in v18? (no chance for 0003 though)

This patchset does not apply anymore due to 2c0ed86d3. Here is a new
rebase.

This patchset does not apply anymore, due to 5069fef1c this time.
Here is a new rebase.

Thanks
Richard

Attachments:

v6-0001-Expand-virtual-generated-columns-before-sublink-p.patchapplication/octet-stream; name=v6-0001-Expand-virtual-generated-columns-before-sublink-p.patchDownload

From 2c35bf06afaf01b09cdc65306b672aea7b5ee4d5 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 23 Apr 2025 10:29:15 +0900
Subject: [PATCH v6 1/3] Expand virtual generated columns before sublink
 pull-up

Currently, we expand virtual generated columns after we have pulled up
any SubLinks within the query's quals.  This ensures that the virtual
generated column references within SubLinks that should be transformed
into joins are correctly expanded.  This approach works well and has
posed no issues.

In an upcoming patch, we plan to centralize the collection of catalog
information needed early in the planner.  This will help avoid
repeated table_open/table_close calls for relations in the rangetable.
Since this information is required during sublink pull-up, we are
moving the expansion of virtual generated columns to occur beforehand.

To achieve this, if any EXISTS SubLinks can be pulled up, their
rangetables are processed just before pulling them up.
---
 src/backend/optimizer/plan/planner.c          | 17 +++++++-------
 src/backend/optimizer/plan/subselect.c        | 16 ++++++++++++++
 src/backend/optimizer/prep/prepjointree.c     | 20 +++++++----------
 src/include/optimizer/prep.h                  |  2 +-
 .../regress/expected/generated_virtual.out    | 22 +++++++++++++++++++
 src/test/regress/sql/generated_virtual.sql    |  9 ++++++++
 6 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..fbbc42f1600 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -720,6 +720,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	transform_MERGE_to_join(parse);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the query that reference these columns with
+	 * the generation expressions.  Note that this step does not descend into
+	 * sublinks and subqueries; if we pull up any sublinks or subqueries
+	 * below, their rangetables are processed just before pulling them up.
+	 */
+	parse = root->parse = expand_virtual_generated_columns(root);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -743,14 +752,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	preprocess_function_rtes(root);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Recursion issues here are handled in the
-	 * same way as for SubLinks.
-	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
-
 	/*
 	 * Check to see if any subqueries in the jointree can be merged into this
 	 * query.
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index e7cb3fede66..89e6873da08 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1458,6 +1458,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	int			varno;
 	Relids		clause_varnos;
 	Relids		upper_varnos;
+	PlannerInfo subroot;
 
 	Assert(sublink->subLinkType == EXISTS_SUBLINK);
 
@@ -1487,6 +1488,21 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	if (!simplify_EXISTS_query(root, subselect))
 		return NULL;
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 *
+	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
+	 * is fine because only the "glob" and "parse" links will be used in this
+	 * case.
+	 */
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	subselect = expand_virtual_generated_columns(&subroot);
+
 	/*
 	 * Separate out the WHERE clause.  (We could theoretically also remove
 	 * top-level plain JOIN/ON clauses, but it's probably not worth the
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 87dc6f56b57..8140d22de70 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,10 +4,10 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
+ *		expand_virtual_generated_columns
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
- *		expand_virtual_generated_columns
  *		pull_up_subqueries
  *		flatten_simple_union_all
  *		do expression preprocessing (including flattening JOIN alias vars)
@@ -958,10 +958,6 @@ preprocess_function_rtes(PlannerInfo *root)
  * generation expressions.  Note that we do not descend into subqueries; that
  * is taken care of when the subqueries are planned.
  *
- * This has to be done after we have pulled up any SubLinks within the query's
- * quals; otherwise any virtual generated column references within the SubLinks
- * that should be transformed into joins wouldn't get expanded.
- *
  * Returns a modified copy of the query tree, if any relations with virtual
  * generated columns are present.
  */
@@ -1333,6 +1329,13 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/* No CTEs to worry about */
 	Assert(subquery->cteList == NIL);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 */
+	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -1352,13 +1355,6 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	 */
 	preprocess_function_rtes(subroot);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.
-	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
-
 	/*
 	 * Recursively pull up the subquery's subqueries, so that
 	 * pull_up_subqueries' processing is complete for its jointree and
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index df56202777c..ceb731bcf5e 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,10 +22,10 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
+extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void pull_up_subqueries(PlannerInfo *root);
 extern void flatten_simple_union_all(PlannerInfo *root);
 extern void reduce_outer_joins(PlannerInfo *root);
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index df704b5166f..9ecf73de814 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1604,4 +1604,26 @@ select * from gtest32 t group by grouping sets (a, b, c, d, e) having c = 20;
 
 -- Ensure that the virtual generated columns in ALTER COLUMN TYPE USING expression are expanded
 alter table gtest32 alter column e type bigint using b;
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (t1.a > t2.a)
+   ->  Seq Scan on gtest32 t1
+   ->  Materialize
+         ->  Seq Scan on gtest32 t2
+               Filter: ((a * 2) = 2)
+(6 rows)
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+ ?column? 
+----------
+        1
+(1 row)
+
 drop table gtest32;
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index 6fa986515b9..fb0d88cf21a 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -845,4 +845,13 @@ select * from gtest32 t group by grouping sets (a, b, c, d, e) having c = 20;
 -- Ensure that the virtual generated columns in ALTER COLUMN TYPE USING expression are expanded
 alter table gtest32 alter column e type bigint using b;
 
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
 drop table gtest32;
-- 
2.43.0

v6-0002-Centralize-collection-of-catalog-info-needed-earl.patchapplication/octet-stream; name=v6-0002-Centralize-collection-of-catalog-info-needed-earl.patchDownload

From da7437703697e6e667195f8856c52774811f3643 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 24 Apr 2025 14:58:03 +0900
Subject: [PATCH v6 2/3] Centralize collection of catalog info needed early in
 the planner

There are several pieces of catalog information that need to be
retrieved for a relation during the early stage of planning.  These
include relhassubclass, which is used to clear the inh flag if the
relation has no children, as well as a column's attgenerated and
default value, which are needed to expand virtual generated columns.
More such information may be required in the future.

Currently, these pieces of catalog data are collected in multiple
places, resulting in repeated table_open/table_close calls for each
relation in the rangetable.  This patch centralizes the collection of
all required early-stage catalog information into a single loop over
the rangetable, allowing each relation to be opened and closed only
once.
---
 src/backend/optimizer/plan/planner.c      |  31 +--
 src/backend/optimizer/plan/subselect.c    |   9 +-
 src/backend/optimizer/prep/prepjointree.c | 299 +++++++++++++---------
 src/include/optimizer/prep.h              |   2 +-
 4 files changed, 190 insertions(+), 151 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fbbc42f1600..fc13d921d0c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -721,13 +721,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	transform_MERGE_to_join(parse);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Note that this step does not descend into
-	 * sublinks and subqueries; if we pull up any sublinks or subqueries
-	 * below, their rangetables are processed just before pulling them up.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.  Note that this
+	 * step does not descend into sublinks and subqueries; if we pull up any
+	 * sublinks or subqueries below, their relation RTEs are processed just
+	 * before pulling them up.
 	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
+	parse = root->parse = preprocess_relation_rtes(root);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
@@ -788,23 +790,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 
 		switch (rte->rtekind)
 		{
-			case RTE_RELATION:
-				if (rte->inh)
-				{
-					/*
-					 * Check to see if the relation actually has any children;
-					 * if not, clear the inh flag so we can treat it as a
-					 * plain base relation.
-					 *
-					 * Note: this could give a false-positive result, if the
-					 * rel once had children but no longer does.  We used to
-					 * be able to clear rte->inh later on when we discovered
-					 * that, but no more; we have to handle such cases as
-					 * full-fledged inheritance.
-					 */
-					rte->inh = has_subclass(rte->relid);
-				}
-				break;
 			case RTE_JOIN:
 				root->hasJoinRTEs = true;
 				if (IS_OUTER_JOIN(rte->jointype))
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 89e6873da08..65fc3f49d39 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1489,9 +1489,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 		return NULL;
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1501,7 +1502,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	subroot.type = T_PlannerInfo;
 	subroot.glob = root->glob;
 	subroot.parse = subselect;
-	subselect = expand_virtual_generated_columns(&subroot);
+	subselect = preprocess_relation_rtes(&subroot);
 
 	/*
 	 * Separate out the WHERE clause.  (We could theoretically also remove
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 8140d22de70..4b38851bd42 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,7 +4,7 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
- *		expand_virtual_generated_columns
+ *		preprocess_relation_rtes
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
@@ -102,6 +102,9 @@ typedef struct reduce_outer_joins_partial_state
 	Relids		unreduced_side; /* relids in its still-nullable side */
 } reduce_outer_joins_partial_state;
 
+static Query *expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+											   RangeTblEntry *rte, int rt_index,
+											   Relation relation);
 static Node *pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
 											   Relids *relids);
 static Node *pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
@@ -392,6 +395,173 @@ transform_MERGE_to_join(Query *parse)
 		parse->mergeJoinCondition = NULL;	/* join condition not needed */
 }
 
+/*
+ * preprocess_relation_rtes
+ *		Do the preprocessing work for any relation RTEs in the FROM clause.
+ *
+ * This scans the rangetable for relation RTEs and retrieves the necessary
+ * catalog information for each relation.  Using this information, it clears
+ * the inh flag for any relation that has no children, and expands virtual
+ * generated columns for any relation that contains them.
+ *
+ * Note that expanding virtual generated columns may cause the query tree to
+ * have new copies of rangetable entries.  Therefore, we have to use list_nth
+ * instead of foreach when iterating over the query's rangetable.
+ *
+ * Returns a modified copy of the query tree, if any relations with virtual
+ * generated columns are present.
+ */
+Query *
+preprocess_relation_rtes(PlannerInfo *root)
+{
+	Query	   *parse = root->parse;
+	int			rtable_size;
+	int			rt_index;
+
+	rtable_size = list_length(parse->rtable);
+
+	for (rt_index = 0; rt_index < rtable_size; rt_index++)
+	{
+		RangeTblEntry *rte = rt_fetch(rt_index + 1, parse->rtable);
+		Relation	relation;
+
+		/* We only care about relation RTEs. */
+		if (rte->rtekind != RTE_RELATION)
+			continue;
+
+		/*
+		 * We need not lock the relation since it was already locked by the
+		 * rewriter.
+		 */
+		relation = table_open(rte->relid, NoLock);
+
+		/*
+		 * Check to see if the relation actually has any children; if not,
+		 * clear the inh flag so we can treat it as a plain base relation.
+		 *
+		 * Note: this could give a false-positive result, if the rel once had
+		 * children but no longer does.  We used to be able to clear rte->inh
+		 * later on when we discovered that, but no more; we have to handle
+		 * such cases as full-fledged inheritance.
+		 */
+		if (rte->inh)
+			rte->inh = relation->rd_rel->relhassubclass;
+
+		/*
+		 * Check to see if the relation has any virtual generated columns; if
+		 * so, replace all Var nodes in the query that reference these columns
+		 * with the generation expressions.
+		 */
+		parse = expand_virtual_generated_columns(root, parse,
+												 rte, rt_index + 1,
+												 relation);
+
+		table_close(relation, NoLock);
+	}
+
+	return parse;
+}
+
+/*
+ * expand_virtual_generated_columns
+ *		Expand virtual generated columns for the given relation.
+ *
+ * This checks whether the given relation has any virtual generated columns,
+ * and if so, replaces all Var nodes in the query that reference those columns
+ * with their generation expressions.
+ *
+ * Returns a modified copy of the query tree if the relation contains virtual
+ * generated columns.
+ */
+static Query *
+expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+								 RangeTblEntry *rte, int rt_index,
+								 Relation relation)
+{
+	TupleDesc	tupdesc;
+
+	/* Only normal relations can have virtual generated columns */
+	Assert(rte->rtekind == RTE_RELATION);
+
+	tupdesc = RelationGetDescr(relation);
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		List	   *tlist = NIL;
+		pullup_replace_vars_context rvcontext;
+
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+			TargetEntry *tle;
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				Node	   *defexpr;
+
+				defexpr = build_generation_expression(relation, i + 1);
+				ChangeVarNodes(defexpr, 1, rt_index, 0);
+
+				tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+			else
+			{
+				Var		   *var;
+
+				var = makeVar(rt_index,
+							  i + 1,
+							  attr->atttypid,
+							  attr->atttypmod,
+							  attr->attcollation,
+							  0);
+
+				tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+		}
+
+		Assert(list_length(tlist) > 0);
+		Assert(!rte->lateral);
+
+		/*
+		 * The relation's targetlist items are now in the appropriate form to
+		 * insert into the query, except that we may need to wrap them in
+		 * PlaceHolderVars.  Set up required context data for
+		 * pullup_replace_vars.
+		 */
+		rvcontext.root = root;
+		rvcontext.targetlist = tlist;
+		rvcontext.target_rte = rte;
+		rvcontext.result_relation = parse->resultRelation;
+		/* won't need these values */
+		rvcontext.relids = NULL;
+		rvcontext.nullinfo = NULL;
+		/* pass NULL for outer_hasSubLinks */
+		rvcontext.outer_hasSubLinks = NULL;
+		rvcontext.varno = rt_index;
+		/* this flag will be set below, if needed */
+		rvcontext.wrap_option = REPLACE_WRAP_NONE;
+		/* initialize cache array with indexes 0 .. length(tlist) */
+		rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
+									 sizeof(Node *));
+
+		/*
+		 * If the query uses grouping sets, we need a PlaceHolderVar for each
+		 * expression of the relation's targetlist items.  (See comments in
+		 * pull_up_simple_subquery().)
+		 */
+		if (parse->groupingSets)
+			rvcontext.wrap_option = REPLACE_WRAP_ALL;
+
+		/*
+		 * Apply pullup variable replacement throughout the query tree.
+		 */
+		parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
+	}
+
+	return parse;
+}
+
 /*
  * replace_empty_jointree
  *		If the Query's jointree is empty, replace it with a dummy RTE_RESULT
@@ -949,124 +1119,6 @@ preprocess_function_rtes(PlannerInfo *root)
 	}
 }
 
-/*
- * expand_virtual_generated_columns
- *		Expand all virtual generated column references in a query.
- *
- * This scans the rangetable for relations with virtual generated columns, and
- * replaces all Var nodes in the query that reference these columns with the
- * generation expressions.  Note that we do not descend into subqueries; that
- * is taken care of when the subqueries are planned.
- *
- * Returns a modified copy of the query tree, if any relations with virtual
- * generated columns are present.
- */
-Query *
-expand_virtual_generated_columns(PlannerInfo *root)
-{
-	Query	   *parse = root->parse;
-	int			rt_index;
-	ListCell   *lc;
-
-	rt_index = 0;
-	foreach(lc, parse->rtable)
-	{
-		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
-		Relation	rel;
-		TupleDesc	tupdesc;
-
-		++rt_index;
-
-		/*
-		 * Only normal relations can have virtual generated columns.
-		 */
-		if (rte->rtekind != RTE_RELATION)
-			continue;
-
-		rel = table_open(rte->relid, NoLock);
-
-		tupdesc = RelationGetDescr(rel);
-		if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
-		{
-			List	   *tlist = NIL;
-			pullup_replace_vars_context rvcontext;
-
-			for (int i = 0; i < tupdesc->natts; i++)
-			{
-				Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
-				TargetEntry *tle;
-
-				if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-				{
-					Node	   *defexpr;
-
-					defexpr = build_generation_expression(rel, i + 1);
-					ChangeVarNodes(defexpr, 1, rt_index, 0);
-
-					tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-				else
-				{
-					Var		   *var;
-
-					var = makeVar(rt_index,
-								  i + 1,
-								  attr->atttypid,
-								  attr->atttypmod,
-								  attr->attcollation,
-								  0);
-
-					tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-			}
-
-			Assert(list_length(tlist) > 0);
-			Assert(!rte->lateral);
-
-			/*
-			 * The relation's targetlist items are now in the appropriate form
-			 * to insert into the query, except that we may need to wrap them
-			 * in PlaceHolderVars.  Set up required context data for
-			 * pullup_replace_vars.
-			 */
-			rvcontext.root = root;
-			rvcontext.targetlist = tlist;
-			rvcontext.target_rte = rte;
-			rvcontext.result_relation = parse->resultRelation;
-			/* won't need these values */
-			rvcontext.relids = NULL;
-			rvcontext.nullinfo = NULL;
-			/* pass NULL for outer_hasSubLinks */
-			rvcontext.outer_hasSubLinks = NULL;
-			rvcontext.varno = rt_index;
-			/* this flag will be set below, if needed */
-			rvcontext.wrap_option = REPLACE_WRAP_NONE;
-			/* initialize cache array with indexes 0 .. length(tlist) */
-			rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
-										 sizeof(Node *));
-
-			/*
-			 * If the query uses grouping sets, we need a PlaceHolderVar for
-			 * each expression of the relation's targetlist items.  (See
-			 * comments in pull_up_simple_subquery().)
-			 */
-			if (parse->groupingSets)
-				rvcontext.wrap_option = REPLACE_WRAP_ALL;
-
-			/*
-			 * Apply pullup variable replacement throughout the query tree.
-			 */
-			parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
-		}
-
-		table_close(rel, NoLock);
-	}
-
-	return parse;
-}
-
 /*
  * pull_up_subqueries
  *		Look for subqueries in the rangetable that can be pulled up into
@@ -1330,11 +1382,12 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	Assert(subquery->cteList == NIL);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index ceb731bcf5e..4fbecdb4462 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,7 +22,7 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
+extern Query *preprocess_relation_rtes(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-- 
2.43.0

v6-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchapplication/octet-stream; name=v6-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchDownload

From 9b6e98e3e7e99da23048e6ba7b57856b1af6ec3d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 30 Apr 2025 18:50:37 +0900
Subject: [PATCH v6 3/3] Reduce "Var IS [NOT] NULL" quals during constant
 folding

In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets.  This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner.  However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.

Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point.  This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.

This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks, storing it in a hash table keyed by
relation OID.  It then uses this information to perform the NullTest
deduction for Vars during constant folding.  This also makes it
possible to leverage this information to pull up NOT IN subqueries.

Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false.  Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   8 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   2 +-
 src/backend/optimizer/plan/initsplan.c        |  24 +---
 src/backend/optimizer/plan/planner.c          |  12 +-
 src/backend/optimizer/plan/subselect.c        |  20 ++-
 src/backend/optimizer/prep/prepjointree.c     |  19 ++-
 src/backend/optimizer/util/clauses.c          |  92 ++++++++++++-
 src/backend/optimizer/util/inherit.c          |  10 +-
 src/backend/optimizer/util/plancat.c          | 127 +++++++++++++++---
 src/include/nodes/pathnodes.h                 |  12 +-
 src/include/optimizer/optimizer.h             |   2 +
 src/include/optimizer/plancat.h               |   4 +
 .../regress/expected/generated_virtual.out    |   6 +-
 src/test/regress/expected/join.out            |   6 +-
 src/test/regress/expected/predicate.out       |  54 +++++++-
 src/test/regress/sql/predicate.sql            |  18 +++
 src/tools/pgindent/typedefs.list              |   1 +
 17 files changed, 336 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2185b42bb4f..e3621a47bf1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -710,12 +710,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- Op
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
-                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
+                                                              QUERY PLAN                                                              
+--------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e534b40de3c..036c8749914 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -352,7 +352,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;        -- Nu
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- SubscriptingRef
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01804b085b3..3e3fec89252 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -3048,36 +3048,16 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
  * expr_is_nonnullable
  *	  Check to see if the Expr cannot be NULL
  *
- * If the Expr is a simple Var that is defined NOT NULL and meanwhile is not
- * nulled by any outer joins, then we can know that it cannot be NULL.
+ * Currently we only support simple Vars.
  */
 static bool
 expr_is_nonnullable(PlannerInfo *root, Expr *expr)
 {
-	RelOptInfo *rel;
-	Var		   *var;
-
 	/* For now only check simple Vars */
 	if (!IsA(expr, Var))
 		return false;
 
-	var = (Var *) expr;
-
-	/* could the Var be nulled by any outer joins? */
-	if (!bms_is_empty(var->varnullingrels))
-		return false;
-
-	/* system columns cannot be NULL */
-	if (var->varattno < 0)
-		return true;
-
-	/* is the column defined NOT NULL? */
-	rel = find_base_rel(root, var->varno);
-	if (var->varattno > 0 &&
-		bms_is_member(var->varattno, rel->notnullattnums))
-		return true;
-
-	return false;
+	return var_is_nonnullable(root, (Var *) expr, true);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc13d921d0c..c989e72cac5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,6 +342,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	glob->transientPlan = false;
 	glob->dependsOnRole = false;
 	glob->partition_directory = NULL;
+	glob->rel_notnullatts_hash = NULL;
 
 	/*
 	 * Assess whether it's feasible to use parallel mode for this query. We
@@ -723,11 +724,12 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.  Note that this
-	 * step does not descend into sublinks and subqueries; if we pull up any
-	 * sublinks or subqueries below, their relation RTEs are processed just
-	 * before pulling them up.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.  Note that this step does not descend into sublinks and
+	 * subqueries; if we pull up any sublinks or subqueries below, their
+	 * relation RTEs are processed just before pulling them up.
 	 */
 	parse = root->parse = preprocess_relation_rtes(root);
 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 65fc3f49d39..8ea20061594 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1491,8 +1491,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1749,6 +1751,7 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 					  Node **testexpr, List **paramIds)
 {
 	Node	   *whereClause;
+	PlannerInfo subroot;
 	List	   *leftargs,
 			   *rightargs,
 			   *opids,
@@ -1808,12 +1811,15 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 	 * parent aliases were flattened already, and we're not going to pull any
 	 * child Vars (of any description) into the parent.
 	 *
-	 * Note: passing the parent's root to eval_const_expressions is
-	 * technically wrong, but we can get away with it since only the
-	 * boundParams (if any) are used, and those would be the same in a
-	 * subroot.
+	 * Note: we construct up an entirely dummy PlannerInfo to pass to
+	 * eval_const_expressions.  This is fine because only the "glob" and
+	 * "parse" links are used by eval_const_expressions.
 	 */
-	whereClause = eval_const_expressions(root, whereClause);
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	whereClause = eval_const_expressions(&subroot, whereClause);
 	whereClause = (Node *) canonicalize_qual((Expr *) whereClause, false);
 	whereClause = (Node *) make_ands_implicit((Expr *) whereClause);
 
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 4b38851bd42..35e8d3c183b 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -36,6 +36,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
 #include "optimizer/tlist.h"
@@ -401,8 +402,9 @@ transform_MERGE_to_join(Query *parse)
  *
  * This scans the rangetable for relation RTEs and retrieves the necessary
  * catalog information for each relation.  Using this information, it clears
- * the inh flag for any relation that has no children, and expands virtual
- * generated columns for any relation that contains them.
+ * the inh flag for any relation that has no children, collects not-null
+ * attribute numbers for any relation that has column not-null constraints, and
+ * expands virtual generated columns for any relation that contains them.
  *
  * Note that expanding virtual generated columns may cause the query tree to
  * have new copies of rangetable entries.  Therefore, we have to use list_nth
@@ -447,6 +449,13 @@ preprocess_relation_rtes(PlannerInfo *root)
 		if (rte->inh)
 			rte->inh = relation->rd_rel->relhassubclass;
 
+		/*
+		 * Check to see if the relation has any column not-null constraints;
+		 * if so, retrieve the constraint information and store it in a
+		 * relation OID based hash table.
+		 */
+		get_relation_notnullatts(root, relation);
+
 		/*
 		 * Check to see if the relation has any virtual generated columns; if
 		 * so, replace all Var nodes in the query that reference these columns
@@ -1384,8 +1393,10 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 */
 	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 26a3e050086..e83f4b061d7 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -36,6 +37,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
 #include "parser/analyze.h"
@@ -43,6 +45,7 @@
 #include "parser/parse_collate.h"
 #include "parser/parse_func.h"
 #include "parser/parse_oper.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
@@ -2242,7 +2245,8 @@ rowtype_field_matches(Oid rowtypeid, int fieldnum,
  * only operators and functions that are reasonable to try to execute.
  *
  * NOTE: "root" can be passed as NULL if the caller never wants to do any
- * Param substitutions nor receive info about inlined functions.
+ * Param substitutions nor receive info about inlined functions nor reduce
+ * NullTest for Vars to constant true or constant false.
  *
  * NOTE: the planner assumes that this will always flatten nested AND and
  * OR clauses into N-argument form.  See comments in prepqual.c.
@@ -3537,6 +3541,31 @@ eval_const_expressions_mutator(Node *node,
 
 					return makeBoolConst(result, false);
 				}
+				if (!ntest->argisrow && arg && IsA(arg, Var) && context->root)
+				{
+					Var		   *varg = (Var *) arg;
+					bool		result;
+
+					if (var_is_nonnullable(context->root, varg, false))
+					{
+						switch (ntest->nulltesttype)
+						{
+							case IS_NULL:
+								result = false;
+								break;
+							case IS_NOT_NULL:
+								result = true;
+								break;
+							default:
+								elog(ERROR, "unrecognized nulltesttype: %d",
+									 (int) ntest->nulltesttype);
+								result = false; /* keep compiler quiet */
+								break;
+						}
+
+						return makeBoolConst(result, false);
+					}
+				}
 
 				newntest = makeNode(NullTest);
 				newntest->arg = (Expr *) arg;
@@ -4155,6 +4184,67 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 	return newexpr;
 }
 
+/*
+ * var_is_nonnullable: check to see if the Var cannot be NULL
+ *
+ * If the Var is defined NOT NULL and meanwhile is not nulled by any outer
+ * joins or grouping sets, then we can know that it cannot be NULL.
+ *
+ * use_rel_info indicates whether the corresponding RelOptInfo is available for
+ * use.
+ */
+bool
+var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info)
+{
+	Relids		notnullattnums = NULL;
+
+	Assert(IsA(var, Var));
+
+	/* skip upper-level Vars */
+	if (var->varlevelsup != 0)
+		return false;
+
+	/* could the Var be nulled by any outer joins or grouping sets? */
+	if (!bms_is_empty(var->varnullingrels))
+		return false;
+
+	/* system columns cannot be NULL */
+	if (var->varattno < 0)
+		return true;
+
+	/*
+	 * Check if the Var is defined as NOT NULL.  We retrieve the column NOT
+	 * NULL constraint information from the corresponding RelOptInfo if it is
+	 * available; otherwise, we search the hash table for this information.
+	 */
+	if (use_rel_info)
+	{
+		RelOptInfo *rel = find_base_rel(root, var->varno);
+
+		notnullattnums = rel->notnullattnums;
+	}
+	else
+	{
+		RangeTblEntry *rte = planner_rt_fetch(var->varno, root);
+
+		/*
+		 * We must skip inheritance parent tables, as some child tables may
+		 * have a NOT NULL constraint for a column while others may not.  This
+		 * cannot happen with partitioned tables, though.
+		 */
+		if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
+			return false;
+
+		notnullattnums = find_relation_notnullatts(root, rte->relid);
+	}
+
+	if (var->varattno > 0 &&
+		bms_is_member(var->varattno, notnullattnums))
+		return true;
+
+	return false;
+}
+
 /*
  * expand_function_arguments: convert named-notation args to positional args
  * and/or insert default args, as needed
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 17e51cd75d7..30d158069e3 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -466,8 +466,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 								Index *childRTindex_p)
 {
 	Query	   *parse = root->parse;
-	Oid			parentOID PG_USED_FOR_ASSERTS_ONLY =
-		RelationGetRelid(parentrel);
+	Oid			parentOID = RelationGetRelid(parentrel);
 	Oid			childOID = RelationGetRelid(childrel);
 	RangeTblEntry *childrte;
 	Index		childRTindex;
@@ -513,6 +512,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	*childrte_p = childrte;
 	*childRTindex_p = childRTindex;
 
+	/*
+	 * Retrieve column not-null constraint information for the child relation
+	 * if its relation OID is different from the parent's.
+	 */
+	if (childOID != parentOID)
+		get_relation_notnullatts(root, childrel);
+
 	/*
 	 * Build an AppendRelInfo struct for each parent/child pair.
 	 */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 59233b64730..c6a58afc5e5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -59,6 +59,12 @@ int			constraint_exclusion = CONSTRAINT_EXCLUSION_PARTITION;
 /* Hook for plugins to get control in get_relation_info() */
 get_relation_info_hook_type get_relation_info_hook = NULL;
 
+typedef struct NotnullHashEntry
+{
+	Oid			relid;			/* OID of the relation */
+	Relids		notnullattnums; /* attnums of NOT NULL columns */
+} NotnullHashEntry;
+
 
 static void get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 									  Relation relation, bool inhparent);
@@ -172,27 +178,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	 * RangeTblEntry does get populated.
 	 */
 	if (!inhparent || relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		for (int i = 0; i < relation->rd_att->natts; i++)
-		{
-			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
-
-			Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
-
-			if (attr->attnullability == ATTNULLABLE_VALID)
-			{
-				rel->notnullattnums = bms_add_member(rel->notnullattnums,
-													 i + 1);
-
-				/*
-				 * Per RemoveAttributeById(), dropped columns will have their
-				 * attnotnull unset, so we needn't check for dropped columns
-				 * in the above condition.
-				 */
-				Assert(!attr->attisdropped);
-			}
-		}
-	}
+		rel->notnullattnums = find_relation_notnullatts(root, relationObjectId);
 
 	/*
 	 * Estimate relation size --- unless it's an inheritance parent, in which
@@ -683,6 +669,105 @@ get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 	}
 }
 
+/*
+ * get_relation_notnullatts -
+ *	  Retrieves column not-null constraint information for a given relation.
+ *
+ * We do this while we have the relcache entry open, and store the column
+ * not-null constraint information in a hash table based on the relation OID.
+ */
+void
+get_relation_notnullatts(PlannerInfo *root, Relation relation)
+{
+	Oid			relid = RelationGetRelid(relation);
+	NotnullHashEntry *hentry;
+	bool		found;
+	Relids		notnullattnums = NULL;
+
+	/* bail out if the relation has no not-null constraints */
+	if (relation->rd_att->constr == NULL ||
+		!relation->rd_att->constr->has_not_null)
+		return;
+
+	/* create the hash table if it hasn't been created yet */
+	if (root->glob->rel_notnullatts_hash == NULL)
+	{
+		HTAB	   *hashtab;
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(Oid);
+		hash_ctl.entrysize = sizeof(NotnullHashEntry);
+		hash_ctl.hcxt = CurrentMemoryContext;
+
+		hashtab = hash_create("Relation NOT NULL attnums",
+							  64L,	/* arbitrary initial size */
+							  &hash_ctl,
+							  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		root->glob->rel_notnullatts_hash = hashtab;
+	}
+
+	/*
+	 * Create a hash entry for this relation OID, if we don't have one
+	 * already.
+	 */
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_ENTER,
+											  &found);
+
+	/* bail out if a hash entry already exists for this relation OID */
+	if (found)
+		return;
+
+	/* collect the column not-null constraint information for this relation */
+	for (int i = 0; i < relation->rd_att->natts; i++)
+	{
+		CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
+
+		Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
+
+		if (attr->attnullability == ATTNULLABLE_VALID)
+		{
+			notnullattnums = bms_add_member(notnullattnums, i + 1);
+
+			/*
+			 * Per RemoveAttributeById(), dropped columns will have their
+			 * attnotnull unset, so we needn't check for dropped columns in
+			 * the above condition.
+			 */
+			Assert(!attr->attisdropped);
+		}
+	}
+
+	/* ... and initialize the new hash entry */
+	hentry->notnullattnums = notnullattnums;
+}
+
+/*
+ * find_relation_notnullatts -
+ *	  Searches the hash table and returns the column not-null constraint
+ *	  information for a given relation.
+ */
+Relids
+find_relation_notnullatts(PlannerInfo *root, Oid relid)
+{
+	NotnullHashEntry *hentry;
+	bool		found;
+
+	if (root->glob->rel_notnullatts_hash == NULL)
+		return NULL;
+
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_FIND,
+											  &found);
+	if (!found)
+		return NULL;
+
+	return hentry->notnullattnums;
+}
+
 /*
  * infer_arbiter_indexes -
  *	  Determine the unique indexes used to arbitrate speculative insertion.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..e5dd15098f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -179,6 +179,9 @@ typedef struct PlannerGlobal
 
 	/* partition descriptors */
 	PartitionDirectory partition_directory pg_node_attr(read_write_ignore);
+
+	/* hash table for NOT NULL attnums of relations */
+	struct HTAB *rel_notnullatts_hash pg_node_attr(read_write_ignore);
 } PlannerGlobal;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -719,6 +722,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *				the attribute is needed as part of final targetlist
  *		attr_widths - cache space for per-attribute width estimates;
  *					  zero means not computed yet
+ *		notnullattnums - zero-based set containing attnums of NOT NULL
+ *						 columns (not populated for rels corresponding to
+ *						 non-partitioned inh==true RTEs)
  *		nulling_relids - relids of outer joins that can null this rel
  *		lateral_vars - lateral cross-references of rel, if any (list of
  *					   Vars and PlaceHolderVars)
@@ -952,11 +958,7 @@ typedef struct RelOptInfo
 	Relids	   *attr_needed pg_node_attr(read_write_ignore);
 	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths pg_node_attr(read_write_ignore);
-
-	/*
-	 * Zero-based set containing attnums of NOT NULL columns.  Not populated
-	 * for rels corresponding to non-partitioned inh==true RTEs.
-	 */
+	/* zero-based set containing attnums of NOT NULL columns */
 	Bitmapset  *notnullattnums;
 	/* relids of outer joins that can null this baserel */
 	Relids		nulling_relids;
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 546828b54bd..37bc13c2cbd 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -154,6 +154,8 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 						   Oid result_collation);
 
+extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info);
+
 extern List *expand_function_arguments(List *args, bool include_out_arguments,
 									   Oid result_type,
 									   struct HeapTupleData *func_tuple);
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index cd74e4b1e8b..d6f6f4ad2d7 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -28,6 +28,10 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
 extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 							  bool inhparent, RelOptInfo *rel);
 
+extern void get_relation_notnullatts(PlannerInfo *root, Relation relation);
+
+extern Relids find_relation_notnullatts(PlannerInfo *root, Oid relid);
+
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
 extern void estimate_rel_size(Relation rel, int32 *attr_widths,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 9ecf73de814..74279bb816b 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1541,11 +1541,11 @@ where coalesce(t2.b, 1) = 2;
 explain (costs off)
 select t1.a from gtest32 t1 left join gtest32 t2 on t1.a = t2.a
 where coalesce(t2.b, 1) = 2 or t1.a is null;
-                         QUERY PLAN                          
--------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Hash Left Join
    Hash Cond: (t1.a = t2.a)
-   Filter: ((COALESCE((t2.a * 2), 1) = 2) OR (t1.a IS NULL))
+   Filter: (COALESCE((t2.a * 2), 1) = 2)
    ->  Seq Scan on gtest32 t1
    ->  Hash
          ->  Seq Scan on gtest32 t2
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 390aabfb34b..034f3127f0c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3639,8 +3639,8 @@ from nt3 as nt3
     ) as ss2
     on ss2.id = nt3.nt2_id
 where nt3.id = 1 and ss2.b3;
-                  QUERY PLAN                   
------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
@@ -3649,7 +3649,7 @@ where nt3.id = 1 and ss2.b3;
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
          Index Cond: (id = nt2.nt1_id)
-         Filter: (nt2.b1 AND (id IS NOT NULL))
+         Filter: (nt2.b1 AND true)
 (9 rows)
 
 select nt3.id
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index b79037748b7..59bfe33bb1c 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -84,10 +84,10 @@ SELECT * FROM pred_tab t WHERE t.a IS NULL OR t.c IS NULL;
 -- are provably false
 EXPLAIN (COSTS OFF)
 SELECT * FROM pred_tab t WHERE t.b IS NULL OR t.c IS NULL;
-               QUERY PLAN               
-----------------------------------------
+       QUERY PLAN       
+------------------------
  Seq Scan on pred_tab t
-   Filter: ((b IS NULL) OR (c IS NULL))
+   Filter: (b IS NULL)
 (2 rows)
 
 --
@@ -231,6 +231,54 @@ SELECT * FROM pred_tab t1
          ->  Seq Scan on pred_tab t3
 (9 rows)
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: EXISTS(SubPlan 1)
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+   SubPlan 1
+     ->  Nested Loop
+           ->  Nested Loop
+                 ->  Nested Loop
+                       ->  Seq Scan on pred_tab t4
+                       ->  Materialize
+                             ->  Seq Scan on pred_tab t3
+                                   Filter: (t1.a = a)
+                 ->  Materialize
+                       ->  Seq Scan on pred_tab t5
+           ->  Materialize
+                 ->  Seq Scan on pred_tab t6
+(17 rows)
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Left Join
+   Join Filter: (InitPlan 1).col1
+   InitPlan 1
+     ->  Result
+           One-Time Filter: false
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+(8 rows)
+
 DROP TABLE pred_tab;
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
 -- parents.
diff --git a/src/test/regress/sql/predicate.sql b/src/test/regress/sql/predicate.sql
index 9dcb81b1bc5..d92277353a0 100644
--- a/src/test/regress/sql/predicate.sql
+++ b/src/test/regress/sql/predicate.sql
@@ -115,6 +115,24 @@ SELECT * FROM pred_tab t1
     LEFT JOIN pred_tab t2 ON t1.a = 1
     LEFT JOIN pred_tab t3 ON t2.a IS NULL OR t2.c IS NULL;
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+
 DROP TABLE pred_tab;
 
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..50203edb8c0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1756,6 +1756,7 @@ NonEmptyRange
 Notification
 NotificationList
 NotifyStmt
+NotnullHashEntry
 Nsrt
 NtDllRoutine
 NtFlushBuffersFileEx_t
-- 
2.43.0

#34

Andrei Lepikhov

lepihov@gmail.com

6 months ago

In reply to: Richard Guo (#33)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On 30/6/2025 09:26, Richard Guo wrote:

On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:

Yeah, this patchset is targeted for v19. Maybe we could be more
aggressive and have 0001 and 0002 in v18? (no chance for 0003 though)

This patchset does not apply anymore due to 2c0ed86d3. Here is a new
rebase.

This patchset does not apply anymore, due to 5069fef1c this time.
Here is a new rebase.

I like the general idea of this work. But I wonder, why is a new hash
table designed to store only the notnullattnums field? From the
discussion, it is not apparent why not to cache all (or most of) the
data needed for get_relation_info. In cases where multiple subqueries
reference the same table, it could save some cycles and memory.

--
regards, Andrei Lepikhov

#35

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Andrei Lepikhov (#34)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Tue, Jul 1, 2025 at 10:57 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

I like the general idea of this work. But I wonder, why is a new hash
table designed to store only the notnullattnums field? From the
discussion, it is not apparent why not to cache all (or most of) the
data needed for get_relation_info. In cases where multiple subqueries
reference the same table, it could save some cycles and memory.

I think this idea was already thoroughly discussed earlier in this
thread when Robert proposed moving get_relation_info() to an earlier
stage. One reason against it is that not every RTE_RELATION relation
will be actively part of the query. Collecting the whole bundle of
catalog information for such relations is wasteful and can negatively
impact performance.

Thanks
Richard

#36

Andrei Lepikhov

lepihov@gmail.com

6 months ago

In reply to: Richard Guo (#35)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On 2/7/2025 03:24, Richard Guo wrote:

On Tue, Jul 1, 2025 at 10:57 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

I like the general idea of this work. But I wonder, why is a new hash
table designed to store only the notnullattnums field? From the
discussion, it is not apparent why not to cache all (or most of) the
data needed for get_relation_info. In cases where multiple subqueries
reference the same table, it could save some cycles and memory.

I think this idea was already thoroughly discussed earlier in this
thread when Robert proposed moving get_relation_info() to an earlier
stage. One reason against it is that not every RTE_RELATION relation
will be actively part of the query. Collecting the whole bundle of
catalog information for such relations is wasteful and can negatively
impact performance.

I'm trying to understand the phrase "not every relation ...". Could you
clarify that? I know that Postgres can eliminate some self-joins and
outer joins, and might determine that a WHERE clause is always false,
etc. However, these cases seem to be rare, especially when users refine
their queries. Additionally, AFAICS, this is not an issue for partition
pruning.

Generally, I believe these optimisations should have a positive impact.
So, I think "not actively participate" might mean something different.

I must say that I appreciate Tom's idea and see significant benefits in
making the parse tree a read-only structure. In complex queries, it can
be frustrating to make copies of the parse tree, leading to complaints
from users about insufficient memory allocation. This is why, in our
enterprise fork, we support a specific option to avoid copying the parse
tree multiple times.

Therefore, it would be better to find a way to refactor the
`preprocess_relation_rtes` function to gather table statistics lazily
into the hash table when they are needed. For example, we could do this
at the moment of creating the `RelOptInfo` or before a subquery pull-up,
without modifying the RTE at all.

--
regards, Andrei Lepikhov

#37

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Andrei Lepikhov (#36)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

I must say that I appreciate Tom's idea and see significant benefits in
making the parse tree a read-only structure. In complex queries, it can
be frustrating to make copies of the parse tree, leading to complaints
from users about insufficient memory allocation. This is why, in our
enterprise fork, we support a specific option to avoid copying the parse
tree multiple times.

I don't see how the changes in this patchset violate Tom's proposal
regarding keeping the parse tree read-only. The only potential issue
I can see is that we may clear the rte->inh flag in some cases -- but
that behavior has existed for a long time, not starting from this
patchset.

Therefore, it would be better to find a way to refactor the
`preprocess_relation_rtes` function to gather table statistics lazily
into the hash table when they are needed. For example, we could do this
at the moment of creating the `RelOptInfo` or before a subquery pull-up,
without modifying the RTE at all.

All the catalog information collected in preprocess_relation_rtes() is
needed very early in the planner. I don't see how we could move that
logic to a later stage, such as at the moment of creating RelOptInfos
as you mentioned.

Thanks
Richard

#38

Andrei Lepikhov

lepihov@gmail.com

6 months ago

In reply to: Richard Guo (#37)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On 2/7/2025 11:14, Richard Guo wrote:

On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

I must say that I appreciate Tom's idea and see significant benefits in
making the parse tree a read-only structure. In complex queries, it can
be frustrating to make copies of the parse tree, leading to complaints
from users about insufficient memory allocation. This is why, in our
enterprise fork, we support a specific option to avoid copying the parse
tree multiple times.

I don't see how the changes in this patchset violate Tom's proposal
regarding keeping the parse tree read-only. The only potential issue
I can see is that we may clear the rte->inh flag in some cases -- but
that behavior has existed for a long time, not starting from this
patchset.

I think the 1e4351a solution was a little too fast and it changes the
parse tree inside the planner. To achieve a read-only parse tree, we
will need to redesign it.

Therefore, it would be better to find a way to refactor the
`preprocess_relation_rtes` function to gather table statistics lazily
into the hash table when they are needed. For example, we could do this
at the moment of creating the `RelOptInfo` or before a subquery pull-up,
without modifying the RTE at all.

All the catalog information collected in preprocess_relation_rtes() is
needed very early in the planner. I don't see how we could move that
logic to a later stage, such as at the moment of creating RelOptInfos
as you mentioned.

I apologise for the confusion in my previous message. I am not
suggesting that we postpone this. Instead, I would like an explanation
of why you believe that accessing the table statistics earlier could
negatively impact planner performance. As I mentioned before, I have
only envisioned rare instances where join eliminations may reduce the
number of relations and clause evaluations resulting in a constant.

--
regards, Andrei Lepikhov

#39

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Andrei Lepikhov (#38)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 2/7/2025 11:14, Richard Guo wrote:

On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

Therefore, it would be better to find a way to refactor the
`preprocess_relation_rtes` function to gather table statistics lazily
into the hash table when they are needed. For example, we could do this
at the moment of creating the `RelOptInfo` or before a subquery pull-up,
without modifying the RTE at all.

All the catalog information collected in preprocess_relation_rtes() is
needed very early in the planner. I don't see how we could move that
logic to a later stage, such as at the moment of creating RelOptInfos
as you mentioned.

I apologise for the confusion in my previous message. I am not
suggesting that we postpone this. Instead, I would like an explanation
of why you believe that accessing the table statistics earlier could
negatively impact planner performance. As I mentioned before, I have
only envisioned rare instances where join eliminations may reduce the
number of relations and clause evaluations resulting in a constant.

I wonder how you arrived at the conclusion that these cases are rare.
If they truly are, then why have we invested so much effort in
optimizing for them?

I also wonder why you think we should collect all catalog information
at the very early stage of the planner, given that most of it is only
used much later -- after RelOptInfos have been created. If the goal
is to avoid redundant catalog retrieval for the same relation in
get_relation_info(), perhaps adding a caching mechanism within that
function would be a more targeted solution. I don't see a strong
reason for moving get_relation_info() to the very beginning of the
planner.

Thanks
Richard

#40

Andrei Lepikhov

lepihov@gmail.com

6 months ago

In reply to: Richard Guo (#39)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On 3/7/2025 02:30, Richard Guo wrote:

On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

I apologise for the confusion in my previous message. I am not
suggesting that we postpone this. Instead, I would like an explanation
of why you believe that accessing the table statistics earlier could
negatively impact planner performance. As I mentioned before, I have
only envisioned rare instances where join eliminations may reduce the
number of relations and clause evaluations resulting in a constant.

I wonder how you arrived at the conclusion that these cases are rare.
If they truly are, then why have we invested so much effort in
optimizing for them?

There is no direct connection between effort and frequency; it primarily
depends on personal desire. As you might find, much of the effort goes
into convincing the community.
These specific cases should be rare from the Postgres perspective, the
planner's code remains simple based on the assumption that crafting the
appropriate query is the user's responsibility.

I also wonder why you think we should collect all catalog information
at the very early stage of the planner, given that most of it is only
used much later -- after RelOptInfos have been created. If the goal
is to avoid redundant catalog retrieval for the same relation in
get_relation_info(), perhaps adding a caching mechanism within that
function would be a more targeted solution. I don't see a strong
reason for moving get_relation_info() to the very beginning of the
planner.

This indicates that there is still room for further exploration and
discussion. For starters, the 'Redundant NullTest' issue is not the only
concern. Additionally, Postgres processes pull-up transformation blindly
without considering the cost model. However, each pull-up has its corner
case, and in practice, we often see new complaints arise after a new
pull-up technique is committed. One possible solution I envision could
be to examine indexes and/or make raw initial estimations to avoid
problematic pull-up cases.

--
regards, Andrei Lepikhov

#41

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Richard Guo (#33)

3 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Mon, Jun 30, 2025 at 4:26 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:

This patchset does not apply anymore due to 2c0ed86d3. Here is a new
rebase.

This patchset does not apply anymore, due to 5069fef1c this time.
Here is a new rebase.

Here is a new rebase. I moved the call to preprocess_relation_rtes to
a later point within convert_EXISTS_sublink_to_join, so we can avoid
the work if it turns out that the EXISTS SubLink cannot be flattened.
Nothing essential has changed.

The NOT-IN pullup work depends on the changes in this patchset (it
also relies on the not-null information), so I'd like to move it
forward.

Hi Tom, Robert -- just to be sure, are you planning to take another
look at it?

Thanks
Richard

Attachments:

v7-0001-Expand-virtual-generated-columns-before-sublink-p.patchapplication/octet-stream; name=v7-0001-Expand-virtual-generated-columns-before-sublink-p.patchDownload

From 7546cf2ff0abfd77f2c4372a89281856947c3095 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 23 Apr 2025 10:29:15 +0900
Subject: [PATCH v7 1/3] Expand virtual generated columns before sublink
 pull-up

Currently, we expand virtual generated columns after we have pulled up
any SubLinks within the query's quals.  This ensures that the virtual
generated column references within SubLinks that should be transformed
into joins are correctly expanded.  This approach works well and has
posed no issues.

In an upcoming patch, we plan to centralize the collection of catalog
information needed early in the planner.  This will help avoid
repeated table_open/table_close calls for relations in the rangetable.
Since this information is required during sublink pull-up, we are
moving the expansion of virtual generated columns to occur beforehand.

To achieve this, if any EXISTS SubLinks can be pulled up, their
rangetables are processed just before pulling them up.
---
 src/backend/optimizer/plan/planner.c          | 17 ++++++------
 src/backend/optimizer/plan/subselect.c        | 27 +++++++++++++++++++
 src/backend/optimizer/prep/prepjointree.c     | 20 ++++++--------
 src/include/optimizer/prep.h                  |  2 +-
 .../regress/expected/generated_virtual.out    | 22 +++++++++++++++
 src/test/regress/sql/generated_virtual.sql    |  9 +++++++
 6 files changed, 76 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..fbbc42f1600 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -720,6 +720,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	transform_MERGE_to_join(parse);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the query that reference these columns with
+	 * the generation expressions.  Note that this step does not descend into
+	 * sublinks and subqueries; if we pull up any sublinks or subqueries
+	 * below, their rangetables are processed just before pulling them up.
+	 */
+	parse = root->parse = expand_virtual_generated_columns(root);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -743,14 +752,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	 */
 	preprocess_function_rtes(root);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Recursion issues here are handled in the
-	 * same way as for SubLinks.
-	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
-
 	/*
 	 * Check to see if any subqueries in the jointree can be merged into this
 	 * query.
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index e7cb3fede66..575303b294a 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1454,6 +1454,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	Query	   *parse = root->parse;
 	Query	   *subselect = (Query *) sublink->subselect;
 	Node	   *whereClause;
+	PlannerInfo subroot;
 	int			rtoffset;
 	int			varno;
 	Relids		clause_varnos;
@@ -1515,6 +1516,32 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	if (contain_volatile_functions(whereClause))
 		return NULL;
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 *
+	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
+	 * is fine because only the "glob" and "parse" links will be used in this
+	 * case.
+	 *
+	 * Note: we temporarily assign back the WHERE clause so that any virtual
+	 * generated column references within it can be expanded.  It should be
+	 * separated out again afterward.
+	 */
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	subselect->jointree->quals = whereClause;
+	subselect = expand_virtual_generated_columns(&subroot);
+
+	/*
+	 * Now separate out the WHERE clause again.
+	 */
+	whereClause = subselect->jointree->quals;
+	subselect->jointree->quals = NULL;
+
 	/*
 	 * The subquery must have a nonempty jointree, but we can make it so.
 	 */
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 87dc6f56b57..8140d22de70 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,10 +4,10 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
+ *		expand_virtual_generated_columns
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
- *		expand_virtual_generated_columns
  *		pull_up_subqueries
  *		flatten_simple_union_all
  *		do expression preprocessing (including flattening JOIN alias vars)
@@ -958,10 +958,6 @@ preprocess_function_rtes(PlannerInfo *root)
  * generation expressions.  Note that we do not descend into subqueries; that
  * is taken care of when the subqueries are planned.
  *
- * This has to be done after we have pulled up any SubLinks within the query's
- * quals; otherwise any virtual generated column references within the SubLinks
- * that should be transformed into joins wouldn't get expanded.
- *
  * Returns a modified copy of the query tree, if any relations with virtual
  * generated columns are present.
  */
@@ -1333,6 +1329,13 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/* No CTEs to worry about */
 	Assert(subquery->cteList == NIL);
 
+	/*
+	 * Scan the rangetable for relations with virtual generated columns, and
+	 * replace all Var nodes in the subquery that reference these columns with
+	 * the generation expressions.
+	 */
+	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
 	 * that we don't need so many special cases to deal with that situation.
@@ -1352,13 +1355,6 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	 */
 	preprocess_function_rtes(subroot);
 
-	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.
-	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
-
 	/*
 	 * Recursively pull up the subquery's subqueries, so that
 	 * pull_up_subqueries' processing is complete for its jointree and
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index df56202777c..ceb731bcf5e 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,10 +22,10 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
+extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
 extern void pull_up_subqueries(PlannerInfo *root);
 extern void flatten_simple_union_all(PlannerInfo *root);
 extern void reduce_outer_joins(PlannerInfo *root);
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 3b40e15a95a..a635cb1e776 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1613,4 +1613,26 @@ select * from gtest32 t group by grouping sets (a, b, c, d, e) having c = 20;
 
 -- Ensure that the virtual generated columns in ALTER COLUMN TYPE USING expression are expanded
 alter table gtest32 alter column e type bigint using b;
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (t1.a > t2.a)
+   ->  Seq Scan on gtest32 t1
+   ->  Materialize
+         ->  Seq Scan on gtest32 t2
+               Filter: ((a * 2) = 2)
+(6 rows)
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+ ?column? 
+----------
+        1
+(1 row)
+
 drop table gtest32;
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index e2b31853e01..ba19bc4c701 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -858,4 +858,13 @@ select * from gtest32 t group by grouping sets (a, b, c, d, e) having c = 20;
 -- Ensure that the virtual generated columns in ALTER COLUMN TYPE USING expression are expanded
 alter table gtest32 alter column e type bigint using b;
 
+-- Ensure that virtual generated column references within SubLinks that should
+-- be transformed into joins can get expanded
+explain (costs off)
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
+select 1 from gtest32 t1 where exists
+  (select 1 from gtest32 t2 where t1.a > t2.a and t2.b = 2);
+
 drop table gtest32;
-- 
2.43.0

v7-0002-Centralize-collection-of-catalog-info-needed-earl.patchapplication/octet-stream; name=v7-0002-Centralize-collection-of-catalog-info-needed-earl.patchDownload

From 52a351b3daccdeb8a54a3fbd0cdc31f2e320c6d1 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 24 Apr 2025 14:58:03 +0900
Subject: [PATCH v7 2/3] Centralize collection of catalog info needed early in
 the planner

There are several pieces of catalog information that need to be
retrieved for a relation during the early stage of planning.  These
include relhassubclass, which is used to clear the inh flag if the
relation has no children, as well as a column's attgenerated and
default value, which are needed to expand virtual generated columns.
More such information may be required in the future.

Currently, these pieces of catalog data are collected in multiple
places, resulting in repeated table_open/table_close calls for each
relation in the rangetable.  This patch centralizes the collection of
all required early-stage catalog information into a single loop over
the rangetable, allowing each relation to be opened and closed only
once.
---
 src/backend/optimizer/plan/planner.c      |  31 +--
 src/backend/optimizer/plan/subselect.c    |   9 +-
 src/backend/optimizer/prep/prepjointree.c | 299 +++++++++++++---------
 src/include/optimizer/prep.h              |   2 +-
 4 files changed, 190 insertions(+), 151 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fbbc42f1600..fc13d921d0c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -721,13 +721,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	transform_MERGE_to_join(parse);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the query that reference these columns with
-	 * the generation expressions.  Note that this step does not descend into
-	 * sublinks and subqueries; if we pull up any sublinks or subqueries
-	 * below, their rangetables are processed just before pulling them up.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.  Note that this
+	 * step does not descend into sublinks and subqueries; if we pull up any
+	 * sublinks or subqueries below, their relation RTEs are processed just
+	 * before pulling them up.
 	 */
-	parse = root->parse = expand_virtual_generated_columns(root);
+	parse = root->parse = preprocess_relation_rtes(root);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
@@ -788,23 +790,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 
 		switch (rte->rtekind)
 		{
-			case RTE_RELATION:
-				if (rte->inh)
-				{
-					/*
-					 * Check to see if the relation actually has any children;
-					 * if not, clear the inh flag so we can treat it as a
-					 * plain base relation.
-					 *
-					 * Note: this could give a false-positive result, if the
-					 * rel once had children but no longer does.  We used to
-					 * be able to clear rte->inh later on when we discovered
-					 * that, but no more; we have to handle such cases as
-					 * full-fledged inheritance.
-					 */
-					rte->inh = has_subclass(rte->relid);
-				}
-				break;
 			case RTE_JOIN:
 				root->hasJoinRTEs = true;
 				if (IS_OUTER_JOIN(rte->jointype))
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 575303b294a..4bdca59df64 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1517,9 +1517,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 		return NULL;
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1534,7 +1535,7 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	subroot.glob = root->glob;
 	subroot.parse = subselect;
 	subselect->jointree->quals = whereClause;
-	subselect = expand_virtual_generated_columns(&subroot);
+	subselect = preprocess_relation_rtes(&subroot);
 
 	/*
 	 * Now separate out the WHERE clause again.
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 8140d22de70..4b38851bd42 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -4,7 +4,7 @@
  *	  Planner preprocessing for subqueries and join tree manipulation.
  *
  * NOTE: the intended sequence for invoking these operations is
- *		expand_virtual_generated_columns
+ *		preprocess_relation_rtes
  *		replace_empty_jointree
  *		pull_up_sublinks
  *		preprocess_function_rtes
@@ -102,6 +102,9 @@ typedef struct reduce_outer_joins_partial_state
 	Relids		unreduced_side; /* relids in its still-nullable side */
 } reduce_outer_joins_partial_state;
 
+static Query *expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+											   RangeTblEntry *rte, int rt_index,
+											   Relation relation);
 static Node *pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
 											   Relids *relids);
 static Node *pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
@@ -392,6 +395,173 @@ transform_MERGE_to_join(Query *parse)
 		parse->mergeJoinCondition = NULL;	/* join condition not needed */
 }
 
+/*
+ * preprocess_relation_rtes
+ *		Do the preprocessing work for any relation RTEs in the FROM clause.
+ *
+ * This scans the rangetable for relation RTEs and retrieves the necessary
+ * catalog information for each relation.  Using this information, it clears
+ * the inh flag for any relation that has no children, and expands virtual
+ * generated columns for any relation that contains them.
+ *
+ * Note that expanding virtual generated columns may cause the query tree to
+ * have new copies of rangetable entries.  Therefore, we have to use list_nth
+ * instead of foreach when iterating over the query's rangetable.
+ *
+ * Returns a modified copy of the query tree, if any relations with virtual
+ * generated columns are present.
+ */
+Query *
+preprocess_relation_rtes(PlannerInfo *root)
+{
+	Query	   *parse = root->parse;
+	int			rtable_size;
+	int			rt_index;
+
+	rtable_size = list_length(parse->rtable);
+
+	for (rt_index = 0; rt_index < rtable_size; rt_index++)
+	{
+		RangeTblEntry *rte = rt_fetch(rt_index + 1, parse->rtable);
+		Relation	relation;
+
+		/* We only care about relation RTEs. */
+		if (rte->rtekind != RTE_RELATION)
+			continue;
+
+		/*
+		 * We need not lock the relation since it was already locked by the
+		 * rewriter.
+		 */
+		relation = table_open(rte->relid, NoLock);
+
+		/*
+		 * Check to see if the relation actually has any children; if not,
+		 * clear the inh flag so we can treat it as a plain base relation.
+		 *
+		 * Note: this could give a false-positive result, if the rel once had
+		 * children but no longer does.  We used to be able to clear rte->inh
+		 * later on when we discovered that, but no more; we have to handle
+		 * such cases as full-fledged inheritance.
+		 */
+		if (rte->inh)
+			rte->inh = relation->rd_rel->relhassubclass;
+
+		/*
+		 * Check to see if the relation has any virtual generated columns; if
+		 * so, replace all Var nodes in the query that reference these columns
+		 * with the generation expressions.
+		 */
+		parse = expand_virtual_generated_columns(root, parse,
+												 rte, rt_index + 1,
+												 relation);
+
+		table_close(relation, NoLock);
+	}
+
+	return parse;
+}
+
+/*
+ * expand_virtual_generated_columns
+ *		Expand virtual generated columns for the given relation.
+ *
+ * This checks whether the given relation has any virtual generated columns,
+ * and if so, replaces all Var nodes in the query that reference those columns
+ * with their generation expressions.
+ *
+ * Returns a modified copy of the query tree if the relation contains virtual
+ * generated columns.
+ */
+static Query *
+expand_virtual_generated_columns(PlannerInfo *root, Query *parse,
+								 RangeTblEntry *rte, int rt_index,
+								 Relation relation)
+{
+	TupleDesc	tupdesc;
+
+	/* Only normal relations can have virtual generated columns */
+	Assert(rte->rtekind == RTE_RELATION);
+
+	tupdesc = RelationGetDescr(relation);
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		List	   *tlist = NIL;
+		pullup_replace_vars_context rvcontext;
+
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+			TargetEntry *tle;
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				Node	   *defexpr;
+
+				defexpr = build_generation_expression(relation, i + 1);
+				ChangeVarNodes(defexpr, 1, rt_index, 0);
+
+				tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+			else
+			{
+				Var		   *var;
+
+				var = makeVar(rt_index,
+							  i + 1,
+							  attr->atttypid,
+							  attr->atttypmod,
+							  attr->attcollation,
+							  0);
+
+				tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
+				tlist = lappend(tlist, tle);
+			}
+		}
+
+		Assert(list_length(tlist) > 0);
+		Assert(!rte->lateral);
+
+		/*
+		 * The relation's targetlist items are now in the appropriate form to
+		 * insert into the query, except that we may need to wrap them in
+		 * PlaceHolderVars.  Set up required context data for
+		 * pullup_replace_vars.
+		 */
+		rvcontext.root = root;
+		rvcontext.targetlist = tlist;
+		rvcontext.target_rte = rte;
+		rvcontext.result_relation = parse->resultRelation;
+		/* won't need these values */
+		rvcontext.relids = NULL;
+		rvcontext.nullinfo = NULL;
+		/* pass NULL for outer_hasSubLinks */
+		rvcontext.outer_hasSubLinks = NULL;
+		rvcontext.varno = rt_index;
+		/* this flag will be set below, if needed */
+		rvcontext.wrap_option = REPLACE_WRAP_NONE;
+		/* initialize cache array with indexes 0 .. length(tlist) */
+		rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
+									 sizeof(Node *));
+
+		/*
+		 * If the query uses grouping sets, we need a PlaceHolderVar for each
+		 * expression of the relation's targetlist items.  (See comments in
+		 * pull_up_simple_subquery().)
+		 */
+		if (parse->groupingSets)
+			rvcontext.wrap_option = REPLACE_WRAP_ALL;
+
+		/*
+		 * Apply pullup variable replacement throughout the query tree.
+		 */
+		parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
+	}
+
+	return parse;
+}
+
 /*
  * replace_empty_jointree
  *		If the Query's jointree is empty, replace it with a dummy RTE_RESULT
@@ -949,124 +1119,6 @@ preprocess_function_rtes(PlannerInfo *root)
 	}
 }
 
-/*
- * expand_virtual_generated_columns
- *		Expand all virtual generated column references in a query.
- *
- * This scans the rangetable for relations with virtual generated columns, and
- * replaces all Var nodes in the query that reference these columns with the
- * generation expressions.  Note that we do not descend into subqueries; that
- * is taken care of when the subqueries are planned.
- *
- * Returns a modified copy of the query tree, if any relations with virtual
- * generated columns are present.
- */
-Query *
-expand_virtual_generated_columns(PlannerInfo *root)
-{
-	Query	   *parse = root->parse;
-	int			rt_index;
-	ListCell   *lc;
-
-	rt_index = 0;
-	foreach(lc, parse->rtable)
-	{
-		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
-		Relation	rel;
-		TupleDesc	tupdesc;
-
-		++rt_index;
-
-		/*
-		 * Only normal relations can have virtual generated columns.
-		 */
-		if (rte->rtekind != RTE_RELATION)
-			continue;
-
-		rel = table_open(rte->relid, NoLock);
-
-		tupdesc = RelationGetDescr(rel);
-		if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
-		{
-			List	   *tlist = NIL;
-			pullup_replace_vars_context rvcontext;
-
-			for (int i = 0; i < tupdesc->natts; i++)
-			{
-				Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
-				TargetEntry *tle;
-
-				if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-				{
-					Node	   *defexpr;
-
-					defexpr = build_generation_expression(rel, i + 1);
-					ChangeVarNodes(defexpr, 1, rt_index, 0);
-
-					tle = makeTargetEntry((Expr *) defexpr, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-				else
-				{
-					Var		   *var;
-
-					var = makeVar(rt_index,
-								  i + 1,
-								  attr->atttypid,
-								  attr->atttypmod,
-								  attr->attcollation,
-								  0);
-
-					tle = makeTargetEntry((Expr *) var, i + 1, 0, false);
-					tlist = lappend(tlist, tle);
-				}
-			}
-
-			Assert(list_length(tlist) > 0);
-			Assert(!rte->lateral);
-
-			/*
-			 * The relation's targetlist items are now in the appropriate form
-			 * to insert into the query, except that we may need to wrap them
-			 * in PlaceHolderVars.  Set up required context data for
-			 * pullup_replace_vars.
-			 */
-			rvcontext.root = root;
-			rvcontext.targetlist = tlist;
-			rvcontext.target_rte = rte;
-			rvcontext.result_relation = parse->resultRelation;
-			/* won't need these values */
-			rvcontext.relids = NULL;
-			rvcontext.nullinfo = NULL;
-			/* pass NULL for outer_hasSubLinks */
-			rvcontext.outer_hasSubLinks = NULL;
-			rvcontext.varno = rt_index;
-			/* this flag will be set below, if needed */
-			rvcontext.wrap_option = REPLACE_WRAP_NONE;
-			/* initialize cache array with indexes 0 .. length(tlist) */
-			rvcontext.rv_cache = palloc0((list_length(tlist) + 1) *
-										 sizeof(Node *));
-
-			/*
-			 * If the query uses grouping sets, we need a PlaceHolderVar for
-			 * each expression of the relation's targetlist items.  (See
-			 * comments in pull_up_simple_subquery().)
-			 */
-			if (parse->groupingSets)
-				rvcontext.wrap_option = REPLACE_WRAP_ALL;
-
-			/*
-			 * Apply pullup variable replacement throughout the query tree.
-			 */
-			parse = (Query *) pullup_replace_vars((Node *) parse, &rvcontext);
-		}
-
-		table_close(rel, NoLock);
-	}
-
-	return parse;
-}
-
 /*
  * pull_up_subqueries
  *		Look for subqueries in the rangetable that can be pulled up into
@@ -1330,11 +1382,12 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	Assert(subquery->cteList == NIL);
 
 	/*
-	 * Scan the rangetable for relations with virtual generated columns, and
-	 * replace all Var nodes in the subquery that reference these columns with
-	 * the generation expressions.
+	 * Scan the rangetable for relation RTEs and retrieve the necessary
+	 * catalog information for each relation.  Using this information, clear
+	 * the inh flag for any relation that has no children, and expand virtual
+	 * generated columns for any relation that contains them.
 	 */
-	subquery = subroot->parse = expand_virtual_generated_columns(subroot);
+	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
 	/*
 	 * If the FROM clause is empty, replace it with a dummy RTE_RESULT RTE, so
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index ceb731bcf5e..4fbecdb4462 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -22,7 +22,7 @@
  * prototypes for prepjointree.c
  */
 extern void transform_MERGE_to_join(Query *parse);
-extern Query *expand_virtual_generated_columns(PlannerInfo *root);
+extern Query *preprocess_relation_rtes(PlannerInfo *root);
 extern void replace_empty_jointree(Query *parse);
 extern void pull_up_sublinks(PlannerInfo *root);
 extern void preprocess_function_rtes(PlannerInfo *root);
-- 
2.43.0

v7-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchapplication/octet-stream; name=v7-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patchDownload

From d0779748482614aabc1c7d126031f1551c984049 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 30 Apr 2025 18:50:37 +0900
Subject: [PATCH v7 3/3] Reduce "Var IS [NOT] NULL" quals during constant
 folding

In commit b262ad440, we introduced an optimization that reduces an IS
[NOT] NULL qual on a NOT NULL column to constant true or constant
false, provided we can prove that the input expression of the NullTest
is not nullable by any outer joins or grouping sets.  This deduction
happens quite late in the planner, during the distribution of quals to
rels in query_planner.  However, this approach has some drawbacks: we
can't perform any further folding with the constant, and it turns out
to be prone to bugs.

Ideally, this deduction should happen during constant folding.
However, the per-relation information about which columns are defined
as NOT NULL is not available at that point.  This information is
currently collected from catalogs when building RelOptInfos for base
or "other" relations.

This patch moves the collection of NOT NULL attribute information for
relations before pull_up_sublinks, storing it in a hash table keyed by
relation OID.  It then uses this information to perform the NullTest
deduction for Vars during constant folding.  This also makes it
possible to leverage this information to pull up NOT IN subqueries.

Note that this patch does not get rid of restriction_is_always_true
and restriction_is_always_false.  Removing them would prevent us from
reducing some IS [NOT] NULL quals that we were previously able to
reduce, because (a) the self-join elimination may introduce new IS NOT
NULL quals after constant folding, and (b) if some outer joins are
converted to inner joins, previously irreducible NullTest quals may
become reducible.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   8 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   2 +-
 src/backend/optimizer/plan/initsplan.c        |  24 +---
 src/backend/optimizer/plan/planner.c          |  12 +-
 src/backend/optimizer/plan/subselect.c        |  20 ++-
 src/backend/optimizer/prep/prepjointree.c     |  19 ++-
 src/backend/optimizer/util/clauses.c          |  92 ++++++++++++-
 src/backend/optimizer/util/inherit.c          |  10 +-
 src/backend/optimizer/util/plancat.c          | 127 +++++++++++++++---
 src/include/nodes/pathnodes.h                 |  12 +-
 src/include/optimizer/optimizer.h             |   2 +
 src/include/optimizer/plancat.h               |   4 +
 .../regress/expected/generated_virtual.out    |   6 +-
 src/test/regress/expected/join.out            |   6 +-
 src/test/regress/expected/predicate.out       |  54 +++++++-
 src/test/regress/sql/predicate.sql            |  18 +++
 src/tools/pgindent/typedefs.list              |   1 +
 17 files changed, 336 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2185b42bb4f..e3621a47bf1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -710,12 +710,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- Op
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
-                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
+                                                              QUERY PLAN                                                              
+--------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e534b40de3c..036c8749914 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -352,7 +352,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;        -- Nu
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- SubscriptingRef
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01804b085b3..3e3fec89252 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -3048,36 +3048,16 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
  * expr_is_nonnullable
  *	  Check to see if the Expr cannot be NULL
  *
- * If the Expr is a simple Var that is defined NOT NULL and meanwhile is not
- * nulled by any outer joins, then we can know that it cannot be NULL.
+ * Currently we only support simple Vars.
  */
 static bool
 expr_is_nonnullable(PlannerInfo *root, Expr *expr)
 {
-	RelOptInfo *rel;
-	Var		   *var;
-
 	/* For now only check simple Vars */
 	if (!IsA(expr, Var))
 		return false;
 
-	var = (Var *) expr;
-
-	/* could the Var be nulled by any outer joins? */
-	if (!bms_is_empty(var->varnullingrels))
-		return false;
-
-	/* system columns cannot be NULL */
-	if (var->varattno < 0)
-		return true;
-
-	/* is the column defined NOT NULL? */
-	rel = find_base_rel(root, var->varno);
-	if (var->varattno > 0 &&
-		bms_is_member(var->varattno, rel->notnullattnums))
-		return true;
-
-	return false;
+	return var_is_nonnullable(root, (Var *) expr, true);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fc13d921d0c..c989e72cac5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,6 +342,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	glob->transientPlan = false;
 	glob->dependsOnRole = false;
 	glob->partition_directory = NULL;
+	glob->rel_notnullatts_hash = NULL;
 
 	/*
 	 * Assess whether it's feasible to use parallel mode for this query. We
@@ -723,11 +724,12 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.  Note that this
-	 * step does not descend into sublinks and subqueries; if we pull up any
-	 * sublinks or subqueries below, their relation RTEs are processed just
-	 * before pulling them up.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.  Note that this step does not descend into sublinks and
+	 * subqueries; if we pull up any sublinks or subqueries below, their
+	 * relation RTEs are processed just before pulling them up.
 	 */
 	parse = root->parse = preprocess_relation_rtes(root);
 
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 4bdca59df64..d71ed958e31 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1519,8 +1519,10 @@ convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 *
 	 * Note: we construct up an entirely dummy PlannerInfo for use here.  This
 	 * is fine because only the "glob" and "parse" links will be used in this
@@ -1760,6 +1762,7 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 					  Node **testexpr, List **paramIds)
 {
 	Node	   *whereClause;
+	PlannerInfo subroot;
 	List	   *leftargs,
 			   *rightargs,
 			   *opids,
@@ -1819,12 +1822,15 @@ convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 	 * parent aliases were flattened already, and we're not going to pull any
 	 * child Vars (of any description) into the parent.
 	 *
-	 * Note: passing the parent's root to eval_const_expressions is
-	 * technically wrong, but we can get away with it since only the
-	 * boundParams (if any) are used, and those would be the same in a
-	 * subroot.
+	 * Note: we construct up an entirely dummy PlannerInfo to pass to
+	 * eval_const_expressions.  This is fine because only the "glob" and
+	 * "parse" links are used by eval_const_expressions.
 	 */
-	whereClause = eval_const_expressions(root, whereClause);
+	MemSet(&subroot, 0, sizeof(subroot));
+	subroot.type = T_PlannerInfo;
+	subroot.glob = root->glob;
+	subroot.parse = subselect;
+	whereClause = eval_const_expressions(&subroot, whereClause);
 	whereClause = (Node *) canonicalize_qual((Expr *) whereClause, false);
 	whereClause = (Node *) make_ands_implicit((Expr *) whereClause);
 
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 4b38851bd42..35e8d3c183b 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -36,6 +36,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
 #include "optimizer/tlist.h"
@@ -401,8 +402,9 @@ transform_MERGE_to_join(Query *parse)
  *
  * This scans the rangetable for relation RTEs and retrieves the necessary
  * catalog information for each relation.  Using this information, it clears
- * the inh flag for any relation that has no children, and expands virtual
- * generated columns for any relation that contains them.
+ * the inh flag for any relation that has no children, collects not-null
+ * attribute numbers for any relation that has column not-null constraints, and
+ * expands virtual generated columns for any relation that contains them.
  *
  * Note that expanding virtual generated columns may cause the query tree to
  * have new copies of rangetable entries.  Therefore, we have to use list_nth
@@ -447,6 +449,13 @@ preprocess_relation_rtes(PlannerInfo *root)
 		if (rte->inh)
 			rte->inh = relation->rd_rel->relhassubclass;
 
+		/*
+		 * Check to see if the relation has any column not-null constraints;
+		 * if so, retrieve the constraint information and store it in a
+		 * relation OID based hash table.
+		 */
+		get_relation_notnullatts(root, relation);
+
 		/*
 		 * Check to see if the relation has any virtual generated columns; if
 		 * so, replace all Var nodes in the query that reference these columns
@@ -1384,8 +1393,10 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	/*
 	 * Scan the rangetable for relation RTEs and retrieve the necessary
 	 * catalog information for each relation.  Using this information, clear
-	 * the inh flag for any relation that has no children, and expand virtual
-	 * generated columns for any relation that contains them.
+	 * the inh flag for any relation that has no children, collect not-null
+	 * attribute numbers for any relation that has column not-null
+	 * constraints, and expand virtual generated columns for any relation that
+	 * contains them.
 	 */
 	subquery = subroot->parse = preprocess_relation_rtes(subroot);
 
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index f45131c34c5..6f0b338d2cd 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -20,6 +20,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -36,6 +37,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
 #include "parser/analyze.h"
@@ -43,6 +45,7 @@
 #include "parser/parse_collate.h"
 #include "parser/parse_func.h"
 #include "parser/parse_oper.h"
+#include "parser/parsetree.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "tcop/tcopprot.h"
@@ -2242,7 +2245,8 @@ rowtype_field_matches(Oid rowtypeid, int fieldnum,
  * only operators and functions that are reasonable to try to execute.
  *
  * NOTE: "root" can be passed as NULL if the caller never wants to do any
- * Param substitutions nor receive info about inlined functions.
+ * Param substitutions nor receive info about inlined functions nor reduce
+ * NullTest for Vars to constant true or constant false.
  *
  * NOTE: the planner assumes that this will always flatten nested AND and
  * OR clauses into N-argument form.  See comments in prepqual.c.
@@ -3544,6 +3548,31 @@ eval_const_expressions_mutator(Node *node,
 
 					return makeBoolConst(result, false);
 				}
+				if (!ntest->argisrow && arg && IsA(arg, Var) && context->root)
+				{
+					Var		   *varg = (Var *) arg;
+					bool		result;
+
+					if (var_is_nonnullable(context->root, varg, false))
+					{
+						switch (ntest->nulltesttype)
+						{
+							case IS_NULL:
+								result = false;
+								break;
+							case IS_NOT_NULL:
+								result = true;
+								break;
+							default:
+								elog(ERROR, "unrecognized nulltesttype: %d",
+									 (int) ntest->nulltesttype);
+								result = false; /* keep compiler quiet */
+								break;
+						}
+
+						return makeBoolConst(result, false);
+					}
+				}
 
 				newntest = makeNode(NullTest);
 				newntest->arg = (Expr *) arg;
@@ -4162,6 +4191,67 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 	return newexpr;
 }
 
+/*
+ * var_is_nonnullable: check to see if the Var cannot be NULL
+ *
+ * If the Var is defined NOT NULL and meanwhile is not nulled by any outer
+ * joins or grouping sets, then we can know that it cannot be NULL.
+ *
+ * use_rel_info indicates whether the corresponding RelOptInfo is available for
+ * use.
+ */
+bool
+var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info)
+{
+	Relids		notnullattnums = NULL;
+
+	Assert(IsA(var, Var));
+
+	/* skip upper-level Vars */
+	if (var->varlevelsup != 0)
+		return false;
+
+	/* could the Var be nulled by any outer joins or grouping sets? */
+	if (!bms_is_empty(var->varnullingrels))
+		return false;
+
+	/* system columns cannot be NULL */
+	if (var->varattno < 0)
+		return true;
+
+	/*
+	 * Check if the Var is defined as NOT NULL.  We retrieve the column NOT
+	 * NULL constraint information from the corresponding RelOptInfo if it is
+	 * available; otherwise, we search the hash table for this information.
+	 */
+	if (use_rel_info)
+	{
+		RelOptInfo *rel = find_base_rel(root, var->varno);
+
+		notnullattnums = rel->notnullattnums;
+	}
+	else
+	{
+		RangeTblEntry *rte = planner_rt_fetch(var->varno, root);
+
+		/*
+		 * We must skip inheritance parent tables, as some child tables may
+		 * have a NOT NULL constraint for a column while others may not.  This
+		 * cannot happen with partitioned tables, though.
+		 */
+		if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE)
+			return false;
+
+		notnullattnums = find_relation_notnullatts(root, rte->relid);
+	}
+
+	if (var->varattno > 0 &&
+		bms_is_member(var->varattno, notnullattnums))
+		return true;
+
+	return false;
+}
+
 /*
  * expand_function_arguments: convert named-notation args to positional args
  * and/or insert default args, as needed
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 17e51cd75d7..30d158069e3 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -466,8 +466,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 								Index *childRTindex_p)
 {
 	Query	   *parse = root->parse;
-	Oid			parentOID PG_USED_FOR_ASSERTS_ONLY =
-		RelationGetRelid(parentrel);
+	Oid			parentOID = RelationGetRelid(parentrel);
 	Oid			childOID = RelationGetRelid(childrel);
 	RangeTblEntry *childrte;
 	Index		childRTindex;
@@ -513,6 +512,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	*childrte_p = childrte;
 	*childRTindex_p = childRTindex;
 
+	/*
+	 * Retrieve column not-null constraint information for the child relation
+	 * if its relation OID is different from the parent's.
+	 */
+	if (childOID != parentOID)
+		get_relation_notnullatts(root, childrel);
+
 	/*
 	 * Build an AppendRelInfo struct for each parent/child pair.
 	 */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 59233b64730..c6a58afc5e5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -59,6 +59,12 @@ int			constraint_exclusion = CONSTRAINT_EXCLUSION_PARTITION;
 /* Hook for plugins to get control in get_relation_info() */
 get_relation_info_hook_type get_relation_info_hook = NULL;
 
+typedef struct NotnullHashEntry
+{
+	Oid			relid;			/* OID of the relation */
+	Relids		notnullattnums; /* attnums of NOT NULL columns */
+} NotnullHashEntry;
+
 
 static void get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 									  Relation relation, bool inhparent);
@@ -172,27 +178,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	 * RangeTblEntry does get populated.
 	 */
 	if (!inhparent || relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		for (int i = 0; i < relation->rd_att->natts; i++)
-		{
-			CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
-
-			Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
-
-			if (attr->attnullability == ATTNULLABLE_VALID)
-			{
-				rel->notnullattnums = bms_add_member(rel->notnullattnums,
-													 i + 1);
-
-				/*
-				 * Per RemoveAttributeById(), dropped columns will have their
-				 * attnotnull unset, so we needn't check for dropped columns
-				 * in the above condition.
-				 */
-				Assert(!attr->attisdropped);
-			}
-		}
-	}
+		rel->notnullattnums = find_relation_notnullatts(root, relationObjectId);
 
 	/*
 	 * Estimate relation size --- unless it's an inheritance parent, in which
@@ -683,6 +669,105 @@ get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 	}
 }
 
+/*
+ * get_relation_notnullatts -
+ *	  Retrieves column not-null constraint information for a given relation.
+ *
+ * We do this while we have the relcache entry open, and store the column
+ * not-null constraint information in a hash table based on the relation OID.
+ */
+void
+get_relation_notnullatts(PlannerInfo *root, Relation relation)
+{
+	Oid			relid = RelationGetRelid(relation);
+	NotnullHashEntry *hentry;
+	bool		found;
+	Relids		notnullattnums = NULL;
+
+	/* bail out if the relation has no not-null constraints */
+	if (relation->rd_att->constr == NULL ||
+		!relation->rd_att->constr->has_not_null)
+		return;
+
+	/* create the hash table if it hasn't been created yet */
+	if (root->glob->rel_notnullatts_hash == NULL)
+	{
+		HTAB	   *hashtab;
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(Oid);
+		hash_ctl.entrysize = sizeof(NotnullHashEntry);
+		hash_ctl.hcxt = CurrentMemoryContext;
+
+		hashtab = hash_create("Relation NOT NULL attnums",
+							  64L,	/* arbitrary initial size */
+							  &hash_ctl,
+							  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+		root->glob->rel_notnullatts_hash = hashtab;
+	}
+
+	/*
+	 * Create a hash entry for this relation OID, if we don't have one
+	 * already.
+	 */
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_ENTER,
+											  &found);
+
+	/* bail out if a hash entry already exists for this relation OID */
+	if (found)
+		return;
+
+	/* collect the column not-null constraint information for this relation */
+	for (int i = 0; i < relation->rd_att->natts; i++)
+	{
+		CompactAttribute *attr = TupleDescCompactAttr(relation->rd_att, i);
+
+		Assert(attr->attnullability != ATTNULLABLE_UNKNOWN);
+
+		if (attr->attnullability == ATTNULLABLE_VALID)
+		{
+			notnullattnums = bms_add_member(notnullattnums, i + 1);
+
+			/*
+			 * Per RemoveAttributeById(), dropped columns will have their
+			 * attnotnull unset, so we needn't check for dropped columns in
+			 * the above condition.
+			 */
+			Assert(!attr->attisdropped);
+		}
+	}
+
+	/* ... and initialize the new hash entry */
+	hentry->notnullattnums = notnullattnums;
+}
+
+/*
+ * find_relation_notnullatts -
+ *	  Searches the hash table and returns the column not-null constraint
+ *	  information for a given relation.
+ */
+Relids
+find_relation_notnullatts(PlannerInfo *root, Oid relid)
+{
+	NotnullHashEntry *hentry;
+	bool		found;
+
+	if (root->glob->rel_notnullatts_hash == NULL)
+		return NULL;
+
+	hentry = (NotnullHashEntry *) hash_search(root->glob->rel_notnullatts_hash,
+											  &relid,
+											  HASH_FIND,
+											  &found);
+	if (!found)
+		return NULL;
+
+	return hentry->notnullattnums;
+}
+
 /*
  * infer_arbiter_indexes -
  *	  Determine the unique indexes used to arbitrate speculative insertion.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..e5dd15098f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -179,6 +179,9 @@ typedef struct PlannerGlobal
 
 	/* partition descriptors */
 	PartitionDirectory partition_directory pg_node_attr(read_write_ignore);
+
+	/* hash table for NOT NULL attnums of relations */
+	struct HTAB *rel_notnullatts_hash pg_node_attr(read_write_ignore);
 } PlannerGlobal;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -719,6 +722,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *				the attribute is needed as part of final targetlist
  *		attr_widths - cache space for per-attribute width estimates;
  *					  zero means not computed yet
+ *		notnullattnums - zero-based set containing attnums of NOT NULL
+ *						 columns (not populated for rels corresponding to
+ *						 non-partitioned inh==true RTEs)
  *		nulling_relids - relids of outer joins that can null this rel
  *		lateral_vars - lateral cross-references of rel, if any (list of
  *					   Vars and PlaceHolderVars)
@@ -952,11 +958,7 @@ typedef struct RelOptInfo
 	Relids	   *attr_needed pg_node_attr(read_write_ignore);
 	/* array indexed [min_attr .. max_attr] */
 	int32	   *attr_widths pg_node_attr(read_write_ignore);
-
-	/*
-	 * Zero-based set containing attnums of NOT NULL columns.  Not populated
-	 * for rels corresponding to non-partitioned inh==true RTEs.
-	 */
+	/* zero-based set containing attnums of NOT NULL columns */
 	Bitmapset  *notnullattnums;
 	/* relids of outer joins that can null this baserel */
 	Relids		nulling_relids;
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 546828b54bd..37bc13c2cbd 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -154,6 +154,8 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
 extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 						   Oid result_collation);
 
+extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info);
+
 extern List *expand_function_arguments(List *args, bool include_out_arguments,
 									   Oid result_type,
 									   struct HeapTupleData *func_tuple);
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index cd74e4b1e8b..d6f6f4ad2d7 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -28,6 +28,10 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
 extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 							  bool inhparent, RelOptInfo *rel);
 
+extern void get_relation_notnullatts(PlannerInfo *root, Relation relation);
+
+extern Relids find_relation_notnullatts(PlannerInfo *root, Oid relid);
+
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
 extern void estimate_rel_size(Relation rel, int32 *attr_widths,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index a635cb1e776..aca6347babe 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1550,11 +1550,11 @@ where coalesce(t2.b, 1) = 2;
 explain (costs off)
 select t1.a from gtest32 t1 left join gtest32 t2 on t1.a = t2.a
 where coalesce(t2.b, 1) = 2 or t1.a is null;
-                         QUERY PLAN                          
--------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Hash Left Join
    Hash Cond: (t1.a = t2.a)
-   Filter: ((COALESCE((t2.a * 2), 1) = 2) OR (t1.a IS NULL))
+   Filter: (COALESCE((t2.a * 2), 1) = 2)
    ->  Seq Scan on gtest32 t1
    ->  Hash
          ->  Seq Scan on gtest32 t2
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 46ddfa844c5..4d5d35d0727 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3639,8 +3639,8 @@ from nt3 as nt3
     ) as ss2
     on ss2.id = nt3.nt2_id
 where nt3.id = 1 and ss2.b3;
-                  QUERY PLAN                   
------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop
    ->  Nested Loop
          ->  Index Scan using nt3_pkey on nt3
@@ -3649,7 +3649,7 @@ where nt3.id = 1 and ss2.b3;
                Index Cond: (id = nt3.nt2_id)
    ->  Index Only Scan using nt1_pkey on nt1
          Index Cond: (id = nt2.nt1_id)
-         Filter: (nt2.b1 AND (id IS NOT NULL))
+         Filter: (nt2.b1 AND true)
 (9 rows)
 
 select nt3.id
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index b79037748b7..59bfe33bb1c 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -84,10 +84,10 @@ SELECT * FROM pred_tab t WHERE t.a IS NULL OR t.c IS NULL;
 -- are provably false
 EXPLAIN (COSTS OFF)
 SELECT * FROM pred_tab t WHERE t.b IS NULL OR t.c IS NULL;
-               QUERY PLAN               
-----------------------------------------
+       QUERY PLAN       
+------------------------
  Seq Scan on pred_tab t
-   Filter: ((b IS NULL) OR (c IS NULL))
+   Filter: (b IS NULL)
 (2 rows)
 
 --
@@ -231,6 +231,54 @@ SELECT * FROM pred_tab t1
          ->  Seq Scan on pred_tab t3
 (9 rows)
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: EXISTS(SubPlan 1)
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+   SubPlan 1
+     ->  Nested Loop
+           ->  Nested Loop
+                 ->  Nested Loop
+                       ->  Seq Scan on pred_tab t4
+                       ->  Materialize
+                             ->  Seq Scan on pred_tab t3
+                                   Filter: (t1.a = a)
+                 ->  Materialize
+                       ->  Seq Scan on pred_tab t5
+           ->  Materialize
+                 ->  Seq Scan on pred_tab t6
+(17 rows)
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+             QUERY PLAN              
+-------------------------------------
+ Nested Loop Left Join
+   Join Filter: (InitPlan 1).col1
+   InitPlan 1
+     ->  Result
+           One-Time Filter: false
+   ->  Seq Scan on pred_tab t1
+   ->  Materialize
+         ->  Seq Scan on pred_tab t2
+(8 rows)
+
 DROP TABLE pred_tab;
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
 -- parents.
diff --git a/src/test/regress/sql/predicate.sql b/src/test/regress/sql/predicate.sql
index 9dcb81b1bc5..d92277353a0 100644
--- a/src/test/regress/sql/predicate.sql
+++ b/src/test/regress/sql/predicate.sql
@@ -115,6 +115,24 @@ SELECT * FROM pred_tab t1
     LEFT JOIN pred_tab t2 ON t1.a = 1
     LEFT JOIN pred_tab t3 ON t2.a IS NULL OR t2.c IS NULL;
 
+--
+-- Tests for NullTest reduction in EXISTS sublink
+--
+
+-- Ensure the IS_NOT_NULL qual is ignored
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NOT NULL);
+
+-- Ensure the IS_NULL qual is reduced to constant-FALSE
+EXPLAIN (COSTS OFF)
+SELECT * FROM pred_tab t1
+    LEFT JOIN pred_tab t2 ON EXISTS
+        (SELECT 1 FROM pred_tab t3, pred_tab t4, pred_tab t5, pred_tab t6
+         WHERE t1.a = t3.a AND t6.a IS NULL);
+
 DROP TABLE pred_tab;
 
 -- Validate we handle IS NULL and IS NOT NULL quals correctly with inheritance
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 83192038571..508d450c668 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1762,6 +1762,7 @@ NonEmptyRange
 Notification
 NotificationList
 NotifyStmt
+NotnullHashEntry
 Nsrt
 NtDllRoutine
 NtFlushBuffersFileEx_t
-- 
2.43.0

#42

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Richard Guo (#41)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Jul 9, 2025 at 3:32 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Mon, Jun 30, 2025 at 4:26 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, May 28, 2025 at 6:28 PM Richard Guo <guofenglinux@gmail.com> wrote:

This patchset does not apply anymore due to 2c0ed86d3. Here is a new
rebase.

This patchset does not apply anymore, due to 5069fef1c this time.
Here is a new rebase.

Here is a new rebase. I moved the call to preprocess_relation_rtes to
a later point within convert_EXISTS_sublink_to_join, so we can avoid
the work if it turns out that the EXISTS SubLink cannot be flattened.
Nothing essential has changed.

The NOT-IN pullup work depends on the changes in this patchset (it
also relies on the not-null information), so I'd like to move it
forward.

Hi Tom, Robert -- just to be sure, are you planning to take another
look at it?

I'm aiming to push this patchset next week, barring any objections.

Thanks
Richard

#43

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Richard Guo (#42)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Jul 16, 2025 at 10:57 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Jul 9, 2025 at 3:32 PM Richard Guo <guofenglinux@gmail.com> wrote:

Here is a new rebase. I moved the call to preprocess_relation_rtes to
a later point within convert_EXISTS_sublink_to_join, so we can avoid
the work if it turns out that the EXISTS SubLink cannot be flattened.
Nothing essential has changed.

The NOT-IN pullup work depends on the changes in this patchset (it
also relies on the not-null information), so I'd like to move it
forward.

Hi Tom, Robert -- just to be sure, are you planning to take another
look at it?

I'm aiming to push this patchset next week, barring any objections.

Hearing nothing, I've gone ahead and pushed the patchset. Thanks for
all the reviews and discussion.

Thanks
Richard

#44

Tomas Vondra

tomas@vondra.me

6 months ago

In reply to: Richard Guo (#43)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On 7/22/25 04:55, Richard Guo wrote:

On Wed, Jul 16, 2025 at 10:57 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Jul 9, 2025 at 3:32 PM Richard Guo <guofenglinux@gmail.com> wrote:

Here is a new rebase. I moved the call to preprocess_relation_rtes to
a later point within convert_EXISTS_sublink_to_join, so we can avoid
the work if it turns out that the EXISTS SubLink cannot be flattened.
Nothing essential has changed.

The NOT-IN pullup work depends on the changes in this patchset (it
also relies on the not-null information), so I'd like to move it
forward.

Hi Tom, Robert -- just to be sure, are you planning to take another
look at it?

I'm aiming to push this patchset next week, barring any objections.

Hearing nothing, I've gone ahead and pushed the patchset. Thanks for
all the reviews and discussion.

Hi Richard,

Does this mean we can close the PG18 open item tracking this?

* virtual generated columns and planning speed
Owner: Peter Eisentraut

If I understand correctly, the commits went only to master, which means
PG18 still does the table_open/table_close calls Tom complained about in
[1]: /messages/by-id/602561.1744314879@sss.pgh.pa.us

I think it'd be perfectly fine if it only affected cases with virtual
generated columns, but AFAICS we do the open/close call for every
relation. Has anyone tried to measure if the impact is measurable? I
suspect it's negligible, we already hold a lock on the rel anyway
(right?). But has anyone tried to measure if that's true?

If it turns out to be expensive, that might be an argument to backpatch
the changes after all - the commits seem fairly small/non-invasive.

regards

[1]: /messages/by-id/602561.1744314879@sss.pgh.pa.us

[2]: /messages/by-id/1514756.1747925490@sss.pgh.pa.us

--
Tomas Vondra

#45

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Tomas Vondra (#44)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Jul 30, 2025 at 3:45 AM Tomas Vondra <tomas@vondra.me> wrote:

Does this mean we can close the PG18 open item tracking this?

* virtual generated columns and planning speed
Owner: Peter Eisentraut

If I understand correctly, the commits went only to master, which means
PG18 still does the table_open/table_close calls Tom complained about in
[1] and [2].

You're right. This patchset is intended only for master, so it
doesn't address that open item for v18.

I think it'd be perfectly fine if it only affected cases with virtual
generated columns, but AFAICS we do the open/close call for every
relation. Has anyone tried to measure if the impact is measurable? I
suspect it's negligible, we already hold a lock on the rel anyway
(right?). But has anyone tried to measure if that's true?

I ran a naive test on v18: selecting from 10 tables, comparing the
unmodified v18 with a hacked version where the call to
expand_virtual_generated_columns() was removed, 3 times each. Here
are the planning times I observed.

create table t (a int, b int, c int);

explain (costs off)
select * from t t1
natural join t t2
natural join t t3
natural join t t4
natural join t t5
natural join t t6
natural join t t7
natural join t t8
natural join t t9
natural join t t10
;

-- unmodified v18
Time: 133.244 ms
Time: 132.831 ms
Time: 132.345 ms

-- the hacked version
Time: 132.756 ms
Time: 132.745 ms
Time: 133.728 ms

I didn't observe measurable impact, but perhaps others can run more
representative tests and demonstrate otherwise.

(I recall Peter E. mentioned he might run some tests to measure the
impact. Not sure if he's had the time to do that yet.)

If it turns out to be expensive, that might be an argument to backpatch
the changes after all - the commits seem fairly small/non-invasive.

The main goal of this patchset is to collect NOT NULL information
early in the planning phase and use it to reduce NullTest quals during
constant folding.

It doesn't eliminate the added table_open call, but it does centralize
the collection of all required early-stage catalog information into a
single table_open/table_close call, which may help justify the added
overhead. However, I think Tom's proposal is to move the expansion
of virtual generated columns to the rewriter, so I'm not sure whether
backpatching this patchset to v18 would fully address his concerns.

(I had previously proposed including 0001 and 0002 in v18, but I
dropped the idea due to a lack of support.)

Thanks
Richard

#46

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Richard Guo (#45)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Jul 30, 2025 at 3:17 PM Richard Guo <guofenglinux@gmail.com> wrote:

create table t (a int, b int, c int);

explain (costs off)
select * from t t1
natural join t t2
natural join t t3
natural join t t4
natural join t t5
natural join t t6
natural join t t7
natural join t t8
natural join t t9
natural join t t10
;

FWIW, for this query, I've observed that table_open/table_close are
also called for each RTE_RELATION in build_physical_tlist(). Not sure
if we should also be concerned about those calls.

It's not clear to me how much performance impact an extra table_open
might have, especially when the lock is already held, and the relation
is likely present in the relcache.

Thanks
Richard

#47

Nathan Bossart

nathandbossart@gmail.com

5 months ago

In reply to: Richard Guo (#45)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Jul 30, 2025 at 03:17:38PM +0900, Richard Guo wrote:

On Wed, Jul 30, 2025 at 3:45 AM Tomas Vondra <tomas@vondra.me> wrote:

Does this mean we can close the PG18 open item tracking this?

* virtual generated columns and planning speed
Owner: Peter Eisentraut

If I understand correctly, the commits went only to master, which means
PG18 still does the table_open/table_close calls Tom complained about in
[1] and [2].

You're right. This patchset is intended only for master, so it
doesn't address that open item for v18.

I think it'd be perfectly fine if it only affected cases with virtual
generated columns, but AFAICS we do the open/close call for every
relation. Has anyone tried to measure if the impact is measurable? I
suspect it's negligible, we already hold a lock on the rel anyway
(right?). But has anyone tried to measure if that's true?

I ran a naive test on v18: selecting from 10 tables, comparing the
unmodified v18 with a hacked version where the call to
expand_virtual_generated_columns() was removed, 3 times each. Here
are the planning times I observed.

[...]

I didn't observe measurable impact, but perhaps others can run more
representative tests and demonstrate otherwise.

(I recall Peter E. mentioned he might run some tests to measure the
impact. Not sure if he's had the time to do that yet.)

There is still an open item for this one, but it's not clear whether we are
planning to do anything about this for v18, especially since nobody has
shown measurable performance impact. Does anyone want to argue for
addressing this for v18, or shall we close the open item as "Won't Fix"?

--
nathan

#48

Richard Guo

guofenglinux@gmail.com

5 months ago

In reply to: Nathan Bossart (#47)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, Jul 30, 2025 at 03:17:38PM +0900, Richard Guo wrote:

I didn't observe measurable impact, but perhaps others can run more
representative tests and demonstrate otherwise.

(I recall Peter E. mentioned he might run some tests to measure the
impact. Not sure if he's had the time to do that yet.)

There is still an open item for this one, but it's not clear whether we are
planning to do anything about this for v18, especially since nobody has
shown measurable performance impact. Does anyone want to argue for
addressing this for v18, or shall we close the open item as "Won't Fix"?

I don't think we're likely to do anything about this for v18.
Actually, I still doubt that the extra table_open call brings any
measurable performance impact, especially since the lock is already
held and the relation is likely already present in the relcache.

Also, I still don't think moving the expansion of virtual generated
columns to the rewriter (as Tom proposed) is a better idea. It turned
out to have several problems that need to be fixed with the help of
PHVs, which is why we moved the expansion into the planner.

Thanks
Richard

#49

Nathan Bossart

nathandbossart@gmail.com

5 months ago

In reply to: Richard Guo (#48)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:

On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

There is still an open item for this one, but it's not clear whether we are
planning to do anything about this for v18, especially since nobody has
shown measurable performance impact. Does anyone want to argue for
addressing this for v18, or shall we close the open item as "Won't Fix"?

I don't think we're likely to do anything about this for v18.
Actually, I still doubt that the extra table_open call brings any
measurable performance impact, especially since the lock is already
held and the relation is likely already present in the relcache.

Also, I still don't think moving the expansion of virtual generated
columns to the rewriter (as Tom proposed) is a better idea. It turned
out to have several problems that need to be fixed with the help of
PHVs, which is why we moved the expansion into the planner.

Okay. I have marked the v18 open item as "Won't Fix".

--
nathan

#50

Richard Guo

guofenglinux@gmail.com

5 months ago

In reply to: Nathan Bossart (#49)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Wed, Aug 20, 2025 at 11:11 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:

On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

There is still an open item for this one, but it's not clear whether we are
planning to do anything about this for v18, especially since nobody has
shown measurable performance impact. Does anyone want to argue for
addressing this for v18, or shall we close the open item as "Won't Fix"?

I don't think we're likely to do anything about this for v18.
Actually, I still doubt that the extra table_open call brings any
measurable performance impact, especially since the lock is already
held and the relation is likely already present in the relcache.

Also, I still don't think moving the expansion of virtual generated
columns to the rewriter (as Tom proposed) is a better idea. It turned
out to have several problems that need to be fixed with the help of
PHVs, which is why we moved the expansion into the planner.

Okay. I have marked the v18 open item as "Won't Fix".

Thank you for helping with this.

Thanks
Richard

#51

Junwang Zhao

zhjwpku@gmail.com

4 months ago

In reply to: Richard Guo (#50)

1 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Hi,

On Thu, Aug 21, 2025 at 9:07 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Aug 20, 2025 at 11:11 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:

On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

There is still an open item for this one, but it's not clear whether we are
planning to do anything about this for v18, especially since nobody has
shown measurable performance impact. Does anyone want to argue for
addressing this for v18, or shall we close the open item as "Won't Fix"?

I don't think we're likely to do anything about this for v18.
Actually, I still doubt that the extra table_open call brings any
measurable performance impact, especially since the lock is already
held and the relation is likely already present in the relcache.

Also, I still don't think moving the expansion of virtual generated
columns to the rewriter (as Tom proposed) is a better idea. It turned
out to have several problems that need to be fixed with the help of
PHVs, which is why we moved the expansion into the planner.

Okay. I have marked the v18 open item as "Won't Fix".

Thank you for helping with this.

Thanks
Richard

While reading this thread, I found that it uses *Relids* to collect NOT NULL
attribute numbers, I think this might be an oversight, since ISTM that
Relids is used to represent the index of the relation in the range table.

I searched the code base and it seems nowhere to use Relids to represent
attribute numbers, and there is a *notnullattnums* field in RelOptInfo:

/* zero-based set containing attnums of NOT NULL columns */
Bitmapset *notnullattnums;

So I think it would be better to be consistent, anyway I post a trivial patch
if the community agrees with me.

--
Regards
Junwang Zhao

Attachments:

v1-0001-use-Bitmapset-to-represent-not-null-attr-nums.patchapplication/octet-stream; name=v1-0001-use-Bitmapset-to-represent-not-null-attr-nums.patchDownload

From 4f5a9c226481de3e53d87cde3e215e63e89e64a4 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <zhjwpku@gmail.com>
Date: Sun, 7 Sep 2025 19:10:18 +0800
Subject: [PATCH v1] use Bitmapset* to represent not null attr nums

---
 src/backend/optimizer/util/clauses.c | 2 +-
 src/backend/optimizer/util/plancat.c | 4 ++--
 src/include/optimizer/plancat.h      | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6f0b338d2cd..967341af18f 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -4203,7 +4203,7 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 bool
 var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info)
 {
-	Relids		notnullattnums = NULL;
+	Bitmapset	   *notnullattnums = NULL;
 
 	Assert(IsA(var, Var));
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 4536bdd6cb4..aca9d861582 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -62,7 +62,7 @@ get_relation_info_hook_type get_relation_info_hook = NULL;
 typedef struct NotnullHashEntry
 {
 	Oid			relid;			/* OID of the relation */
-	Relids		notnullattnums; /* attnums of NOT NULL columns */
+	Bitmapset  *notnullattnums; /* attnums of NOT NULL columns */
 } NotnullHashEntry;
 
 
@@ -750,7 +750,7 @@ get_relation_notnullatts(PlannerInfo *root, Relation relation)
  *	  Searches the hash table and returns the column not-null constraint
  *	  information for a given relation.
  */
-Relids
+Bitmapset *
 find_relation_notnullatts(PlannerInfo *root, Oid relid)
 {
 	NotnullHashEntry *hentry;
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index dd8f2cd157f..96107076832 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -30,7 +30,7 @@ extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 
 extern void get_relation_notnullatts(PlannerInfo *root, Relation relation);
 
-extern Relids find_relation_notnullatts(PlannerInfo *root, Oid relid);
+extern Bitmapset *find_relation_notnullatts(PlannerInfo *root, Oid relid);
 
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
-- 
2.41.0

#52

Tender Wang

tndrwang@gmail.com

4 months ago

In reply to: Junwang Zhao (#51)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Junwang Zhao <zhjwpku@gmail.com> 于2025年9月7日周日 19:12写道：

Hi,

On Thu, Aug 21, 2025 at 9:07 AM Richard Guo <guofenglinux@gmail.com>
wrote:

On Wed, Aug 20, 2025 at 11:11 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Wed, Aug 20, 2025 at 10:29:03AM +0900, Richard Guo wrote:

On Wed, Aug 20, 2025 at 2:38 AM Nathan Bossart <

nathandbossart@gmail.com> wrote:

There is still an open item for this one, but it's not clear

whether we are

planning to do anything about this for v18, especially since nobody

has

shown measurable performance impact. Does anyone want to argue for
addressing this for v18, or shall we close the open item as "Won't

Fix"?

I don't think we're likely to do anything about this for v18.
Actually, I still doubt that the extra table_open call brings any
measurable performance impact, especially since the lock is already
held and the relation is likely already present in the relcache.

Also, I still don't think moving the expansion of virtual generated
columns to the rewriter (as Tom proposed) is a better idea. It

turned

out to have several problems that need to be fixed with the help of
PHVs, which is why we moved the expansion into the planner.

Okay. I have marked the v18 open item as "Won't Fix".

Thank you for helping with this.

Thanks
Richard

While reading this thread, I found that it uses *Relids* to collect NOT
NULL
attribute numbers, I think this might be an oversight, since ISTM that
Relids is used to represent the index of the relation in the range table.

I searched the code base and it seems nowhere to use Relids to represent
attribute numbers, and there is a *notnullattnums* field in RelOptInfo:

/* zero-based set containing attnums of NOT NULL columns */
Bitmapset *notnullattnums;

So I think it would be better to be consistent, anyway I post a trivial
patch
if the community agrees with me.

--
Regards
Junwang Zhao

+1
From the code readability perspective, Bitmapset* seems better.
--
Thanks,
Tender Wang

#53

Richard Guo

guofenglinux@gmail.com

4 months ago

In reply to: Junwang Zhao (#51)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Sun, Sep 7, 2025 at 8:12 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

While reading this thread, I found that it uses *Relids* to collect NOT NULL
attribute numbers, I think this might be an oversight, since ISTM that
Relids is used to represent the index of the relation in the range table.

Nice catch; it's better to use Bitmapset * rather than Relids in this
scenario. That was my oversight; will fix it.

So I think it would be better to be consistent, anyway I post a trivial patch
if the community agrees with me.

Your patch misses one spot: the notnullattnums in
get_relation_notnullatts() should also be fixed. Otherwise it LGTM.

- Richard

#54

Junwang Zhao

zhjwpku@gmail.com

4 months ago

In reply to: Richard Guo (#53)

1 attachment(s)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Mon, Sep 8, 2025 at 4:21 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Sun, Sep 7, 2025 at 8:12 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

While reading this thread, I found that it uses *Relids* to collect NOT NULL
attribute numbers, I think this might be an oversight, since ISTM that
Relids is used to represent the index of the relation in the range table.

Nice catch; it's better to use Bitmapset * rather than Relids in this
scenario. That was my oversight; will fix it.

So I think it would be better to be consistent, anyway I post a trivial patch
if the community agrees with me.

Your patch misses one spot: the notnullattnums in
get_relation_notnullatts() should also be fixed. Otherwise it LGTM.

True, attached v2 adds that missing spot, thanks for the review.

- Richard

--
Regards
Junwang Zhao

Attachments:

v2-0001-use-Bitmapset-to-represent-not-null-attr-nums.patchapplication/octet-stream; name=v2-0001-use-Bitmapset-to-represent-not-null-attr-nums.patchDownload

From ab9475020cdfa5f32cb3756c974a10f90d325e34 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <zhjwpku@gmail.com>
Date: Sun, 7 Sep 2025 19:10:18 +0800
Subject: [PATCH v2] use Bitmapset* to represent not null attr nums

---
 src/backend/optimizer/util/clauses.c | 2 +-
 src/backend/optimizer/util/plancat.c | 6 +++---
 src/include/optimizer/plancat.h      | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6f0b338d2cd..967341af18f 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -4203,7 +4203,7 @@ simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
 bool
 var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info)
 {
-	Relids		notnullattnums = NULL;
+	Bitmapset	   *notnullattnums = NULL;
 
 	Assert(IsA(var, Var));
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 4536bdd6cb4..03845094bc9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -62,7 +62,7 @@ get_relation_info_hook_type get_relation_info_hook = NULL;
 typedef struct NotnullHashEntry
 {
 	Oid			relid;			/* OID of the relation */
-	Relids		notnullattnums; /* attnums of NOT NULL columns */
+	Bitmapset  *notnullattnums; /* attnums of NOT NULL columns */
 } NotnullHashEntry;
 
 
@@ -683,7 +683,7 @@ get_relation_notnullatts(PlannerInfo *root, Relation relation)
 	Oid			relid = RelationGetRelid(relation);
 	NotnullHashEntry *hentry;
 	bool		found;
-	Relids		notnullattnums = NULL;
+	Bitmapset   *notnullattnums = NULL;
 
 	/* bail out if the relation has no not-null constraints */
 	if (relation->rd_att->constr == NULL ||
@@ -750,7 +750,7 @@ get_relation_notnullatts(PlannerInfo *root, Relation relation)
  *	  Searches the hash table and returns the column not-null constraint
  *	  information for a given relation.
  */
-Relids
+Bitmapset *
 find_relation_notnullatts(PlannerInfo *root, Oid relid)
 {
 	NotnullHashEntry *hentry;
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index dd8f2cd157f..96107076832 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -30,7 +30,7 @@ extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
 
 extern void get_relation_notnullatts(PlannerInfo *root, Relation relation);
 
-extern Relids find_relation_notnullatts(PlannerInfo *root, Oid relid);
+extern Bitmapset *find_relation_notnullatts(PlannerInfo *root, Oid relid);
 
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
-- 
2.41.0

#55

Richard Guo

guofenglinux@gmail.com

4 months ago

In reply to: Junwang Zhao (#54)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

On Mon, Sep 8, 2025 at 10:08 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Mon, Sep 8, 2025 at 4:21 PM Richard Guo <guofenglinux@gmail.com> wrote:

Your patch misses one spot: the notnullattnums in
get_relation_notnullatts() should also be fixed. Otherwise it LGTM.

True, attached v2 adds that missing spot, thanks for the review.

Pushed. Thanks for the report and fix.

- Richard

#56

BharatDB

bharatdbpg@gmail.com

4 months ago

In reply to: Richard Guo (#55)

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Dear Team,

In continuation with the previous mail
(CAAh00ETEMEXntw1gxp=xP+4sqrz80tK1R4VEhTpqH9CJpxs-wA) regarding the
optimizations in PostgreSQL 18 to simplify query plans by folding away Var
IS [NOT] NULL checks on columns declared NOT NULL. I experimented with two
approaches, but both hit significant errors:

*1. PlannerInfo-level hash table (HTAB *rel_notnull_info)*

- The idea was to collect NOT NULL constraint info early and use it for
constant folding.
- gen_node_support.pl cannot handle non-serializable HTAB* fields when
generating node serialization code, leading to compilation errors (“could
not handle type HTAB*”).
- Workarounds (e.g., /* nonserialized */ comments) fail due to comment
stripping, and marking the whole PlannerInfo with
pg_node_attr(no_copy_equal,
no_read_write) risks breaking features like parallel query execution or
plan caching.
- Other limitations include potential ABI stability issues from
modifying node structs, increased memory usage from hash tables in nodes,
and the preference for per-relation data structures (e.g., in RelOptInfo)
over global ones.
- A global hash table is a viable alternative but complicates subquery
handling.

*2. Planner-level relattrinfo_htab for column nullability*

- This avoids touching node serialization, but still suffers from
practical issues.
- It crashes during initdb because catalog state is unavailable in
bootstrap mode, requires fragile lifecycle management to avoid memory leaks
or stale entries which leads to risking leaks or stale state, and largely
duplicates the existing var_is_nonnullable() logic.
- In practice, it yields minimal performance benefit since constant
folding and nullability inference are largely handled in core

I’d appreciate feedback on whether pursuing either direction makes sense,
or whether improvements should instead focus on extending the existing
var_is_nonnullable() framework.

Sincerely,
Soumya

On Fri, Sep 12, 2025 at 7:51 AM Richard Guo <guofenglinux@gmail.com> wrote:

Show quoted text

On Mon, Sep 8, 2025 at 10:08 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Mon, Sep 8, 2025 at 4:21 PM Richard Guo <guofenglinux@gmail.com>

wrote:

Your patch misses one spot: the notnullattnums in
get_relation_notnullatts() should also be fixed. Otherwise it LGTM.

True, attached v2 adds that missing spot, thanks for the review.

Pushed. Thanks for the report and fix.

- Richard