Optimize IS DISTINCT FROM with non-nullable inputs

Started by Richard Guo2 months ago9 messageshackers
Jump to latest
#1Richard Guo
guofenglinux@gmail.com

Unlike ordinary comparison operators, the IS [NOT] DISTINCT FROM
predicate treats NULL as a normal data value rather than "unknown".
For non-null inputs, its semantics are identical to standard
operators: IS DISTINCT FROM is equivalent to <>, and IS NOT DISTINCT
FROM is equivalent to =.

Currently, the planner simplifies DistinctExpr only if all inputs are
constants. I'm thinking that maybe we can optimize cases where inputs
are non-constant but proven to be non-nullable, by converting "x IS
DISTINCT FROM y" to "x <> y". This representation exposes the
comparison to the planner as a standard operator. If the clause is
negated (e.g. IS NOT DISTINCT FROM), the resulting "=" operator can
allow the planner to use index scans, merge joins, hash joins, and
EC-based qual deductions.

Attached is a draft patch for this optimization.

I'm kind of concerned about whether there are edge cases where this
transformation is not safe, specifically regarding "rowtype" inputs.
Any feedback would be appreciated.

- Richard

Attachments:

v1-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchapplication/octet-stream; name=v1-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchDownload+63-7
#2Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#1)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

On Fri, Jan 23, 2026 at 10:40 PM Richard Guo <guofenglinux@gmail.com> wrote:

I'm kind of concerned about whether there are edge cases where this
transformation is not safe, specifically regarding "rowtype" inputs.

After a second thought, I think this should be safe even for "rowtype"
inputs. The executor (ExecInterpExpr) evaluates a DistinctExpr with
non-null inputs by simply applying the underlying equality function
and then negating the result. Also, record_eq explicitly treats two
NULLs as equal. Therefore, in this case, converting IS DISTINCT FROM
to the inequality operator in the planner produces the exact same
behavior as the executor.

Also, I think we can play this same trick with BooleanTest (IS [NOT]
TRUE/FALSE/UNKNOWN). A BooleanTest treats a NULL input as the logical
value "unknown". If we know that the input can not be NULL, we can
simplify it directly to a boolean expression or a constant. Please
see 0002.

- Richard

Attachments:

v2-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchapplication/octet-stream; name=v2-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchDownload+260-7
v2-0002-Optimize-BooleanTest-with-non-nullable-input.patchapplication/octet-stream; name=v2-0002-Optimize-BooleanTest-with-non-nullable-input.patchDownload+204-1
#3Tender Wang
tndrwang@gmail.com
In reply to: Richard Guo (#2)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

Hi,

Richard Guo <guofenglinux@gmail.com> 于2026年1月26日周一 16:00写道:

On Fri, Jan 23, 2026 at 10:40 PM Richard Guo <guofenglinux@gmail.com> wrote:

I'm kind of concerned about whether there are edge cases where this
transformation is not safe, specifically regarding "rowtype" inputs.

After a second thought, I think this should be safe even for "rowtype"
inputs. The executor (ExecInterpExpr) evaluates a DistinctExpr with
non-null inputs by simply applying the underlying equality function
and then negating the result. Also, record_eq explicitly treats two
NULLs as equal. Therefore, in this case, converting IS DISTINCT FROM
to the inequality operator in the planner produces the exact same
behavior as the executor.

Also, I think we can play this same trick with BooleanTest (IS [NOT]
TRUE/FALSE/UNKNOWN). A BooleanTest treats a NULL input as the logical
value "unknown". If we know that the input can not be NULL, we can
simplify it directly to a boolean expression or a constant. Please
see 0002.

- Richard

Interesting optimization. I look through the v2-0001 patch. In the
commit message, it says:
...
This patch extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable.
...

But I found that the case "x IS DISTINCT FROM NULL"  is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.
The test case can't cover the following codes:
+                                       /*
+                                        * If one input is an explicit
NULL constant, and the
+                                        * other is a non-nullable
expression, the result is
+                                        * always TRUE.
+                                        */
+                                       if (has_null_input)
+                                               return
makeBoolConst(true, false);

And I adjusted a little above code to check if all regression tests
can enter above if,
if (has_null_input)
{
assert(0);
return makeBoolConst(true, false);
}
All tests pass; no crashes occurred.

--
Thanks,
Tender Wang

#4Richard Guo
guofenglinux@gmail.com
In reply to: Tender Wang (#3)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:

Interesting optimization. I look through the v2-0001 patch. In the
commit message, it says:
...
This patch extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable.
...

But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.

Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".

Another case is custom plans for prepared statements (e.g., WHERE
not_null_col IS DISTINCT FROM $1), which will inject NULL constants
directly into the DistinctExpr during planning, and the parser does
not have a chance to transform it into a NullTest.

The test case can't cover the following codes:
+                                       /*
+                                        * If one input is an explicit
NULL constant, and the
+                                        * other is a non-nullable
expression, the result is
+                                        * always TRUE.
+                                        */
+                                       if (has_null_input)
+                                               return
makeBoolConst(true, false);

Right. I'll need to adjust the test cases to cover this.

- Richard

#5Tender Wang
tndrwang@gmail.com
In reply to: Richard Guo (#4)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

Richard Guo <guofenglinux@gmail.com> 于2026年1月27日周二 15:10写道:

On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:

Interesting optimization. I look through the v2-0001 patch. In the
commit message, it says:
...
This patch extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable.
...

But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.

Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".

Got it.

I looked through the v2-0002 patch. LGTM.

--
Thanks,
Tender Wang

#6Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#4)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

On Tue, Jan 27, 2026 at 4:10 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:

But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.

Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".

BTW, this reminds me that we can teach const-folding to always
transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest, even when x
cannot be proven non-nullable. (The parser have already done that for
literal NULLs.)

This is safe because we know that NullTest with !argisrow is fully
equivalent to SQL's IS [NOT] DISTINCT FROM NULL, even for rowtypes.
It is also beneficial because NullTest is much more amenable to
optimization than DistinctExpr. For example, the planner can deduce
forced-null Vars from a NullTest clause (which can be used to reduce
outer join strength), whereas it lacks such insight for a DistinctExpr.
As an example, consider:

explain (costs off)
select * from t t1 left join t t2 on t1.b = t2.b
where t2.b is not distinct from null;
QUERY PLAN
------------------------------
Hash Anti Join
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1
-> Hash
-> Seq Scan on t t2
(5 rows)

explain (costs off)
select * from t t1 left join t t2 on t1.b = t2.b
where t2.b is not distinct from null::int;
QUERY PLAN
-------------------------------------------------------
Hash Left Join
Hash Cond: (t1.b = t2.b)
Filter: (NOT (t2.b IS DISTINCT FROM NULL::integer))
-> Seq Scan on t t1
-> Hash
-> Seq Scan on t t2
(6 rows)

Please see 0003 for the details of this transformation.

- Richard

Attachments:

v3-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchapplication/octet-stream; name=v3-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchDownload+260-7
v3-0002-Optimize-BooleanTest-with-non-nullable-input.patchapplication/octet-stream; name=v3-0002-Optimize-BooleanTest-with-non-nullable-input.patchDownload+204-1
v3-0003-Teach-planner-to-transform-x-IS-NOT-DISTINCT-FROM.patchapplication/octet-stream; name=v3-0003-Teach-planner-to-transform-x-IS-NOT-DISTINCT-FROM.patchDownload+118-1
#7Tender Wang
tndrwang@gmail.com
In reply to: Richard Guo (#6)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

Richard Guo <guofenglinux@gmail.com> 于2026年1月28日周三 14:42写道:

On Tue, Jan 27, 2026 at 4:10 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:

But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.

Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".

BTW, this reminds me that we can teach const-folding to always
transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest, even when x
cannot be proven non-nullable. (The parser have already done that for
literal NULLs.)

Yeah, this transformation can enable the planner to reduce outer joins
to plain joins.

Please see 0003 for the details of this transformation.

The v3 patches LGTM.

--
Thanks,
Tender Wang

#8Richard Guo
guofenglinux@gmail.com
In reply to: Tender Wang (#7)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

On Thu, Jan 29, 2026 at 5:48 PM Tender Wang <tndrwang@gmail.com> wrote:

The v3 patches LGTM.

Thanks for the review.

I plan to push the v3 patchset soon, barring any objections.

- Richard

#9Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#8)
Re: Optimize IS DISTINCT FROM with non-nullable inputs

On Mon, Feb 9, 2026 at 4:08 PM Richard Guo <guofenglinux@gmail.com> wrote:

Thanks for the review.

I plan to push the v3 patchset soon, barring any objections.

... and done.

- Richard