[PATCH] Optimize IS DISTINCT FROM NULL => IS NOT NULL
Hi list,
This patch enables a simple optimization in
eval_const_expressions_mutator. If we know that one argument to
DistinctExpr is NULL then we can optimize it to a NullTest, which can
be an indexable expression.
For example the query:
EXPLAIN (costs off) SELECT * FROM foo WHERE j IS NOT DISTINCT FROM NULL;
Old behavior:
Seq Scan on foo
Filter: (NOT (j IS DISTINCT FROM NULL::integer))
New behavior:
Index Scan using foo_j_idx on foo
Index Cond: (j IS NULL)
Regards,
Marti
Attachments:
distinct-to-nulltest.patchtext/x-patch; charset=US-ASCII; name=distinct-to-nulltest.patchDownload
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index cd3da46..d9568ca 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2436,7 +2436,7 @@ eval_const_expressions_mutator(Node *node,
ListCell *arg;
bool has_null_input = false;
bool all_null_input = true;
- bool has_nonconst_input = false;
+ Expr *nonconst_expr = NULL;
Expr *simple;
DistinctExpr *newexpr;
@@ -2463,11 +2463,11 @@ eval_const_expressions_mutator(Node *node,
all_null_input &= ((Const *) lfirst(arg))->constisnull;
}
else
- has_nonconst_input = true;
+ nonconst_expr = lfirst(arg);
}
/* all constants? then can optimize this out */
- if (!has_nonconst_input)
+ if (nonconst_expr == NULL)
{
/* all nulls? then not distinct */
if (all_null_input)
@@ -2512,6 +2512,24 @@ eval_const_expressions_mutator(Node *node,
return (Node *) csimple;
}
}
+ else if (has_null_input)
+ {
+ /*
+ * We can optimize: (expr) IS DISTINCT FROM NULL
+ * into: (expr) IS NOT NULL
+ */
+ NullTest *newntest;
+
+ newntest = makeNode(NullTest);
+ newntest->nulltesttype = IS_NOT_NULL;
+ newntest->arg = (Expr *) nonconst_expr;
+
+ /* make_row_distinct_op already flattens row comparisons */
+ Assert(! type_is_rowtype(exprType((Node *) nonconst_expr)));
+ newntest->argisrow = false;
+
+ return (Node *) newntest;
+ }
/*
* The expression cannot be simplified any further, so build
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 38107a0..e8ddc49 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -195,6 +195,13 @@ SELECT null IS DISTINCT FROM null as "no";
f
(1 row)
+EXPLAIN (costs off) SELECT * FROM disttable WHERE f1 IS DISTINCT FROM NULL;
+ QUERY PLAN
+----------------------------
+ Seq Scan on disttable
+ Filter: (f1 IS NOT NULL)
+(2 rows)
+
-- negated form
SELECT 1 IS NOT DISTINCT FROM 2 as "no";
no
@@ -220,3 +227,10 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
t
(1 row)
+EXPLAIN (costs off) SELECT * FROM disttable WHERE f1 IS NOT DISTINCT FROM NULL;
+ QUERY PLAN
+------------------------
+ Seq Scan on disttable
+ Filter: (f1 IS NULL)
+(2 rows)
+
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 328ba51..9767eed 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -56,9 +56,11 @@ SELECT 1 IS DISTINCT FROM 2 as "yes";
SELECT 2 IS DISTINCT FROM 2 as "no";
SELECT 2 IS DISTINCT FROM null as "yes";
SELECT null IS DISTINCT FROM null as "no";
+EXPLAIN (costs off) SELECT * FROM disttable WHERE f1 IS DISTINCT FROM NULL;
-- negated form
SELECT 1 IS NOT DISTINCT FROM 2 as "no";
SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
SELECT 2 IS NOT DISTINCT FROM null as "no";
SELECT null IS NOT DISTINCT FROM null as "yes";
+EXPLAIN (costs off) SELECT * FROM disttable WHERE f1 IS NOT DISTINCT FROM NULL;
Marti Raudsepp <marti@juffo.org> writes:
This patch enables a simple optimization in
eval_const_expressions_mutator. If we know that one argument to
DistinctExpr is NULL then we can optimize it to a NullTest, which can
be an indexable expression.
Uh ... how much do we care about that? I can't say that I've heard many
people complain about the fact that IS [NOT] DISTINCT FROM is poorly
optimized -- which it is, in general, and this patch chips away at that
only a tiny bit, not enough to make it recommendable. If we really
wanted to make that a first-class operation we would need far more work
than this. Plus I don't see why anyone would write the specific case
"IS [NOT] DISTINCT FROM NULL" when they could write half as much.
regards, tom lane
On Thu, Mar 8, 2012 at 19:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Uh ... how much do we care about that? I can't say that I've heard many
people complain about the fact that IS [NOT] DISTINCT FROM is poorly
optimized -- which it is, in general, and this patch chips away at that
only a tiny bit, not enough to make it recommendable.
Agreed, but it was very simple to code, so I figured why not.
Plus I don't see why anyone would write the specific case
"IS [NOT] DISTINCT FROM NULL" when they could write half as much.
Well I can see how it might be useful in generated queries, when
comparing a column to a parameter. If they're using IS DISTINCT FROM
then it's reasonable to expect that the parameter could be NULL
sometimes.
But I don't feel strongly about this, maybe it's not worth
complicating this big function further. :)
Regards,
Marti
Marti Raudsepp <marti@juffo.org> writes:
But I don't feel strongly about this, maybe it's not worth
complicating this big function further. :)
Yeah, that was kind of what I felt about it. If this patch were part of
a grand plan to make IS DISTINCT FROM smarter, that would be one thing.
But if we were to embark on that, likely as not it would involve a
redesign that would invalidate this code anyway. So I'd just as soon
keep it simple for now.
regards, tom lane