missing optimization - column <> column

Started by Pavel Stehuleabout 9 years ago11 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

Same issue is a expression a = a .. can be replaced by true

Regards

Pavel

#2Stephen Frost
sfrost@snowman.net
In reply to: Pavel Stehule (#1)
Re: missing optimization - column <> column

Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

a <> a could go to NULL. Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.

Same issue is a expression a = a .. can be replaced by true

a = a can't be replaced unless you know that 'a' can't be NULL.

In short, fix the application.

Thanks!

Stephen

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#1)
Re: missing optimization - column <> column

Pavel Stehule <pavel.stehule@gmail.com> writes:

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

If you write something like "1 <> 1", it will be folded.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#3)
Re: missing optimization - column <> column

2016-12-05 16:24 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple

optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

understand

If you write something like "1 <> 1", it will be folded.

it works, but a <> a not

Regards

Pavel

Show quoted text

regards, tom lane

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Stephen Frost (#2)
Re: missing optimization - column <> column

2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:

Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple

optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

a <> a could go to NULL. Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.

it should be false everywhere

I don't defend a design of the application - it is exactly wrong. But
sometimes, it can be generated by some tool or it can be a human error. And
bad performance can be a big problems on systems, where you cannot to
deploy fix simply.

It is hard to say what should be good design - because these queries are
slow, I know so these queries are wrong, but these queries does significant
IO utilization - and I cannot to fix the application, because I am not a
author and the fix will not be available in next week.

Regards

Pavel

Show quoted text

Same issue is a expression a = a .. can be replaced by true

a = a can't be replaced unless you know that 'a' can't be NULL.

In short, fix the application.

Thanks!

Stephen

#6Stephen Frost
sfrost@snowman.net
In reply to: Pavel Stehule (#5)
Re: missing optimization - column <> column

Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:

2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:

* Pavel Stehule (pavel.stehule@gmail.com) wrote:

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple

optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

a <> a could go to NULL. Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.

it should be false everywhere

No, it's NULL, not false, if 'a' is NULL:

=# SELECT 1 WHERE (NULL <> NULL) IS NULL;
?column?
----------
1
(1 row)

=*# SELECT 1 WHERE (FALSE) IS NULL;
?column?
----------
(0 rows)

You can not make the assumption that 'a <> a' is always false.

Thanks!

Stephen

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Stephen Frost (#6)
Re: missing optimization - column <> column

2016-12-05 16:41 GMT+01:00 Stephen Frost <sfrost@snowman.net>:

Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:

2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:

* Pavel Stehule (pavel.stehule@gmail.com) wrote:

I found some crazy queries in one customer application. These

queries are

stupid, but it was surprise for me so there are not some simple

optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

a <> a could go to NULL. Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.

it should be false everywhere

No, it's NULL, not false, if 'a' is NULL:

=# SELECT 1 WHERE (NULL <> NULL) IS NULL;
?column?
----------
1
(1 row)

=*# SELECT 1 WHERE (FALSE) IS NULL;
?column?
----------
(0 rows)

You can not make the assumption that 'a <> a' is always false.

ok - when the expression is tested on NULL, then a <> a should not be
reduced. But when there are not this test, then I can replace this
expression by false.

Regards

Pavel

Show quoted text

Thanks!

Stephen

#8Pantelis Theodosiou
ypercube@gmail.com
In reply to: Tom Lane (#3)
Re: missing optimization - column <> column

On Mon, Dec 5, 2016 at 3:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple

optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

If you write something like "1 <> 1", it will be folded.

regards, tom lane

Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a
is (not) nullable or when "a" is more complicated expression?

a <> a : (a IS NULL) AND NULL
a = a : (a IS NOT NULL) OR NULL

Pantelis Theodosiou

#9Corey Huinker
corey.huinker@gmail.com
In reply to: Pantelis Theodosiou (#8)
Re: missing optimization - column <> column

Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a
is (not) nullable or when "a" is more complicated expression?

a <> a : (a IS NULL) AND NULL
a = a : (a IS NOT NULL) OR NULL

I think you're looking for

a IS DISTINCT FROM a

And that will work for cases where a might be null.

I have no opinion about whether adding such a test to the planner is worth
it.

#10Serge Rielau
serge@rielau.com
In reply to: Pavel Stehule (#4)
Re: missing optimization - column <> column

Actually there are lots of things that can be done with this sort of theorem proving.
And NULL is a plenty good answer for a filter, just not for a check constraint.
Amongst them INSERT through UNION ALL for symmetric views which can be handy for FDW partitioned tables.

One such implementation an be found here:
https://www.google.com/patents/US6728952 (apparently expired)

Cheers
Serge

Salesforce.com

Show quoted text

On Dec 5, 2016, at 7:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2016-12-05 16:24 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>:
Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> writes:

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

understand

If you write something like "1 <> 1", it will be folded.

it works, but a <> a not

Regards

Pavel

regards, tom lane

#11Pantelis Theodosiou
ypercube@gmail.com
In reply to: Corey Huinker (#9)
Re: missing optimization - column <> column

On Mon, Dec 5, 2016 at 7:02 PM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that
a is (not) nullable or when "a" is more complicated expression?

a <> a : (a IS NULL) AND NULL
a = a : (a IS NOT NULL) OR NULL

I think you're looking for

a IS DISTINCT FROM a

And that will work for cases where a might be null.

I have no opinion about whether adding such a test to the planner is worth
it.

No, (a IS DISTINCT FROM a) will evaluate to FALSE when a is NULL. The
other conditions (a <> a) , ((a IS NULL) AND NULL) will evaluate to NULL.