Optimize IS DISTINCT FROM with non-nullable inputs
Unlike ordinary comparison operators, the IS [NOT] DISTINCT FROM
predicate treats NULL as a normal data value rather than "unknown".
For non-null inputs, its semantics are identical to standard
operators: IS DISTINCT FROM is equivalent to <>, and IS NOT DISTINCT
FROM is equivalent to =.
Currently, the planner simplifies DistinctExpr only if all inputs are
constants. I'm thinking that maybe we can optimize cases where inputs
are non-constant but proven to be non-nullable, by converting "x IS
DISTINCT FROM y" to "x <> y". This representation exposes the
comparison to the planner as a standard operator. If the clause is
negated (e.g. IS NOT DISTINCT FROM), the resulting "=" operator can
allow the planner to use index scans, merge joins, hash joins, and
EC-based qual deductions.
Attached is a draft patch for this optimization.
I'm kind of concerned about whether there are edge cases where this
transformation is not safe, specifically regarding "rowtype" inputs.
Any feedback would be appreciated.
- Richard
Attachments:
v1-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchapplication/octet-stream; name=v1-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchDownload+63-7
On Fri, Jan 23, 2026 at 10:40 PM Richard Guo <guofenglinux@gmail.com> wrote:
I'm kind of concerned about whether there are edge cases where this
transformation is not safe, specifically regarding "rowtype" inputs.
After a second thought, I think this should be safe even for "rowtype"
inputs. The executor (ExecInterpExpr) evaluates a DistinctExpr with
non-null inputs by simply applying the underlying equality function
and then negating the result. Also, record_eq explicitly treats two
NULLs as equal. Therefore, in this case, converting IS DISTINCT FROM
to the inequality operator in the planner produces the exact same
behavior as the executor.
Also, I think we can play this same trick with BooleanTest (IS [NOT]
TRUE/FALSE/UNKNOWN). A BooleanTest treats a NULL input as the logical
value "unknown". If we know that the input can not be NULL, we can
simplify it directly to a boolean expression or a constant. Please
see 0002.
- Richard
Attachments:
v2-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchapplication/octet-stream; name=v2-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchDownload+260-7
v2-0002-Optimize-BooleanTest-with-non-nullable-input.patchapplication/octet-stream; name=v2-0002-Optimize-BooleanTest-with-non-nullable-input.patchDownload+204-1
Hi,
Richard Guo <guofenglinux@gmail.com> 于2026年1月26日周一 16:00写道:
On Fri, Jan 23, 2026 at 10:40 PM Richard Guo <guofenglinux@gmail.com> wrote:
I'm kind of concerned about whether there are edge cases where this
transformation is not safe, specifically regarding "rowtype" inputs.After a second thought, I think this should be safe even for "rowtype"
inputs. The executor (ExecInterpExpr) evaluates a DistinctExpr with
non-null inputs by simply applying the underlying equality function
and then negating the result. Also, record_eq explicitly treats two
NULLs as equal. Therefore, in this case, converting IS DISTINCT FROM
to the inequality operator in the planner produces the exact same
behavior as the executor.Also, I think we can play this same trick with BooleanTest (IS [NOT]
TRUE/FALSE/UNKNOWN). A BooleanTest treats a NULL input as the logical
value "unknown". If we know that the input can not be NULL, we can
simplify it directly to a boolean expression or a constant. Please
see 0002.- Richard
Interesting optimization. I look through the v2-0001 patch. In the
commit message, it says:
...
This patch extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable.
...
But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.
The test case can't cover the following codes:
+ /*
+ * If one input is an explicit
NULL constant, and the
+ * other is a non-nullable
expression, the result is
+ * always TRUE.
+ */
+ if (has_null_input)
+ return
makeBoolConst(true, false);
And I adjusted a little above code to check if all regression tests
can enter above if,
if (has_null_input)
{
assert(0);
return makeBoolConst(true, false);
}
All tests pass; no crashes occurred.
--
Thanks,
Tender Wang
On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:
Interesting optimization. I look through the v2-0001 patch. In the
commit message, it says:
...
This patch extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable.
...But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.
Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".
Another case is custom plans for prepared statements (e.g., WHERE
not_null_col IS DISTINCT FROM $1), which will inject NULL constants
directly into the DistinctExpr during planning, and the parser does
not have a chance to transform it into a NullTest.
The test case can't cover the following codes: + /* + * If one input is an explicit NULL constant, and the + * other is a non-nullable expression, the result is + * always TRUE. + */ + if (has_null_input) + return makeBoolConst(true, false);
Right. I'll need to adjust the test cases to cover this.
- Richard
Richard Guo <guofenglinux@gmail.com> 于2026年1月27日周二 15:10写道:
On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:
Interesting optimization. I look through the v2-0001 patch. In the
commit message, it says:
...
This patch extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable.
...But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".
Got it.
I looked through the v2-0002 patch. LGTM.
--
Thanks,
Tender Wang
On Tue, Jan 27, 2026 at 4:10 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:
But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.
Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".
BTW, this reminds me that we can teach const-folding to always
transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest, even when x
cannot be proven non-nullable. (The parser have already done that for
literal NULLs.)
This is safe because we know that NullTest with !argisrow is fully
equivalent to SQL's IS [NOT] DISTINCT FROM NULL, even for rowtypes.
It is also beneficial because NullTest is much more amenable to
optimization than DistinctExpr. For example, the planner can deduce
forced-null Vars from a NullTest clause (which can be used to reduce
outer join strength), whereas it lacks such insight for a DistinctExpr.
As an example, consider:
explain (costs off)
select * from t t1 left join t t2 on t1.b = t2.b
where t2.b is not distinct from null;
QUERY PLAN
------------------------------
Hash Anti Join
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1
-> Hash
-> Seq Scan on t t2
(5 rows)
explain (costs off)
select * from t t1 left join t t2 on t1.b = t2.b
where t2.b is not distinct from null::int;
QUERY PLAN
-------------------------------------------------------
Hash Left Join
Hash Cond: (t1.b = t2.b)
Filter: (NOT (t2.b IS DISTINCT FROM NULL::integer))
-> Seq Scan on t t1
-> Hash
-> Seq Scan on t t2
(6 rows)
Please see 0003 for the details of this transformation.
- Richard
Attachments:
v3-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchapplication/octet-stream; name=v3-0001-Optimize-IS-DISTINCT-FROM-with-non-nullable-input.patchDownload+260-7
v3-0002-Optimize-BooleanTest-with-non-nullable-input.patchapplication/octet-stream; name=v3-0002-Optimize-BooleanTest-with-non-nullable-input.patchDownload+204-1
v3-0003-Teach-planner-to-transform-x-IS-NOT-DISTINCT-FROM.patchapplication/octet-stream; name=v3-0003-Teach-planner-to-transform-x-IS-NOT-DISTINCT-FROM.patchDownload+118-1
Richard Guo <guofenglinux@gmail.com> 于2026年1月28日周三 14:42写道:
On Tue, Jan 27, 2026 at 4:10 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Jan 27, 2026 at 11:32 AM Tender Wang <tndrwang@gmail.com> wrote:
But I found that the case "x IS DISTINCT FROM NULL" is converted to
NullTest in transformAExprDistinct().
It will be optimized in the "case T_NullTest:" not by this patch.Well, while it's true that the parser would do this transformation for
"literal" NULLs, here we are talking more about "calculated" NULLs.
Consider "not_null_col IS DISTINCT FROM (1 + NULL)".BTW, this reminds me that we can teach const-folding to always
transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest, even when x
cannot be proven non-nullable. (The parser have already done that for
literal NULLs.)
Yeah, this transformation can enable the planner to reduce outer joins
to plain joins.
Please see 0003 for the details of this transformation.
The v3 patches LGTM.
--
Thanks,
Tender Wang
On Thu, Jan 29, 2026 at 5:48 PM Tender Wang <tndrwang@gmail.com> wrote:
The v3 patches LGTM.
Thanks for the review.
I plan to push the v3 patchset soon, barring any objections.
- Richard