Optimize IS DISTINCT FROM with non-nullable inputs

Started by Richard Guoabout 12 hours ago1 messages

guofenglinux@gmail.com

about 12 hours ago

1 attachment(s)

Unlike ordinary comparison operators, the IS [NOT] DISTINCT FROM
predicate treats NULL as a normal data value rather than "unknown".
For non-null inputs, its semantics are identical to standard
operators: IS DISTINCT FROM is equivalent to <>, and IS NOT DISTINCT
FROM is equivalent to =.

Currently, the planner simplifies DistinctExpr only if all inputs are
constants. I'm thinking that maybe we can optimize cases where inputs
are non-constant but proven to be non-nullable, by converting "x IS
DISTINCT FROM y" to "x <> y". This representation exposes the
comparison to the planner as a standard operator. If the clause is
negated (e.g. IS NOT DISTINCT FROM), the resulting "=" operator can
allow the planner to use index scans, merge joins, hash joins, and
EC-based qual deductions.

Attached is a draft patch for this optimization.

I'm kind of concerned about whether there are edge cases where this
transformation is not safe, specifically regarding "rowtype" inputs.
Any feedback would be appreciated.

- Richard

Attachments:

v1-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchapplication/octet-stream; name=v1-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchDownload

From 94c0d93d13d9428c5f987fb67a03365fcc3a9fea Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Jan 2026 12:11:42 +0900
Subject: [PATCH v1] Optimize IS DISTINCT FROM with non-nullable inputs

The IS DISTINCT FROM construct compares values acting as though NULL
were a normal data value, rather than "unknown".  Semantically, "x IS
DISTINCT FROM y" yields true if the values differ or if exactly one is
NULL, and false if they are equal or both NULL.  Unlike ordinary
comparison operators, it never returns NULL.

Previously, the planner only simplified this construct if all inputs
were constants, folding it to a constant boolean result.  This patch
extends the optimization to cases where inputs are non-constant but
proven to be non-nullable.  Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable.  For cases
where both inputs are guaranteed not to be NULL, the expression
becomes semantically equivalent to "x <> y", and the DistinctExpr is
converted into an inequality OpExpr.

This transformation provides several benefits.  It exposes the
comparison to the planner as a standard operator, allowing the use of
partial indexes and constraint exclusion.  Furthermore, if the clause
is negated (e.g. "IS NOT DISTINCT FROM"), it simplifies to an equality
operator.  This enables the planner to generate better plans using
index scans, merge joins, hash joins, and EC-based qual deductions.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  8 +--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  2 +-
 src/backend/optimizer/util/clauses.c          | 60 ++++++++++++++++++-
 3 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6066510c7c0..7cad5e67d09 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -698,12 +698,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- Op
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
-                                                              QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------------------------
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS DISTINCT FROM c3; -- DistinctExpr
+                                                QUERY PLAN                                                
+----------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL)))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c3 IS DISTINCT FROM c3))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4f7ab2ed0ac..eff25bd2baa 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -340,7 +340,7 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;        -- Nu
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c3 IS NOT NULL) IS DISTINCT FROM (c3 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS DISTINCT FROM c3; -- DistinctExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- SubscriptingRef
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 39d35827c35..4d58a164d3a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2708,6 +2708,7 @@ eval_const_expressions_mutator(Node *node,
 				bool		has_null_input = false;
 				bool		all_null_input = true;
 				bool		has_nonconst_input = false;
+				bool		has_nullable_nonconst = false;
 				Expr	   *simple;
 				DistinctExpr *newexpr;
 
@@ -2724,7 +2725,8 @@ eval_const_expressions_mutator(Node *node,
 				/*
 				 * We must do our own check for NULLs because DistinctExpr has
 				 * different results for NULL input than the underlying
-				 * operator does.
+				 * operator does.  We also check if any non-constant input is
+				 * potentially nullable.
 				 */
 				foreach(arg, args)
 				{
@@ -2734,12 +2736,24 @@ eval_const_expressions_mutator(Node *node,
 						all_null_input &= ((Const *) lfirst(arg))->constisnull;
 					}
 					else
+					{
 						has_nonconst_input = true;
+						all_null_input = false;
+
+						if (!has_nullable_nonconst &&
+							!expr_is_nonnullable(context->root,
+												 (Expr *) lfirst(arg), false))
+							has_nullable_nonconst = true;
+					}
 				}
 
-				/* all constants? then can optimize this out */
 				if (!has_nonconst_input)
 				{
+					/*
+					 * All inputs are constants.  We can optimize this out
+					 * completely.
+					 */
+
 					/* all nulls? then not distinct */
 					if (all_null_input)
 						return makeBoolConst(false, false);
@@ -2784,6 +2798,48 @@ eval_const_expressions_mutator(Node *node,
 						return (Node *) csimple;
 					}
 				}
+				else if (!has_nullable_nonconst)
+				{
+					/*
+					 * There are non-constant inputs, but since all of them
+					 * are proven non-nullable, "IS DISTINCT FROM" semantics
+					 * are much simpler.
+					 */
+
+					OpExpr	   *eqexpr;
+
+					/*
+					 * If one input is an explicit NULL constant, and the
+					 * other is a non-nullable expression, the result is
+					 * always TRUE.
+					 */
+					if (has_null_input)
+						return makeBoolConst(true, false);
+
+					/*
+					 * Otherwise, both inputs are known non-nullable.  In this
+					 * case, "IS DISTINCT FROM" is equivalent to the standard
+					 * inequality operator (usually "<>").  We convert this to
+					 * an OpExpr, which is a more efficient representation for
+					 * the planner.  It can enable the use of partial indexes
+					 * and constraint exclusion.  Furthermore, if the clause
+					 * is negated (e.g. "IS NOT DISTINCT FROM"), the resulting
+					 * "=" operator can allow the planner to use index scans,
+					 * merge joins, hash joins, and EC-based qual deductions.
+					 */
+					eqexpr = makeNode(OpExpr);
+					eqexpr->opno = expr->opno;
+					eqexpr->opfuncid = expr->opfuncid;
+					eqexpr->opresulttype = BOOLOID;
+					eqexpr->opretset = expr->opretset;
+					eqexpr->opcollid = expr->opcollid;
+					eqexpr->inputcollid = expr->inputcollid;
+					eqexpr->args = args;
+					eqexpr->location = expr->location;
+
+					return eval_const_expressions_mutator(negate_clause((Node *) eqexpr),
+														  context);
+				}
 
 				/*
 				 * The expression cannot be simplified any further, so build
-- 
2.39.5 (Apple Git-154)