Huge speed penalty using <>TRUE instead of =FALSE

Started by Jan-Ivar Mellingenalmost 17 years ago11 messagesbugs

jan-ivar.mellingen@alreg.no

almost 17 years ago

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

The problematic query looks like this:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
Alarm_status='X' ORDER BY ID DESC

If it is changed to this it works as expected:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
Alarm_status='X' ORDER BY ID DESC

After investigation (on a smaller dataset on my own database) I found
that the query was resulting in a sequential scan:

"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
time=837.793..837.793 rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
(actual time=837.782..837.782 rows=0 loops=1)"
" Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 837.896 ms"

The modified query gave this result:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
(cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
loops=1)"
" Index Cond: (logg_avsluttet = false)"
" Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 0.123 ms"

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.

Some relevant details from the table definition:
CREATE TABLE alarmlogg
(
id serial NOT NULL,
alarm_status character varying(1) DEFAULT ''::character varying,
logg_avsluttet boolean DEFAULT false,
...
CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
)

CREATE INDEX i_alarmlogg_alarm_status
ON alarmlogg
USING btree
(alarm_status);

CREATE INDEX i_alarmlogg_logg_avsluttet
ON alarmlogg
USING btree
(logg_avsluttet);

Regards,
Jan-Ivar Mellingen
Securinet AS

Mikael Krantz

mk@zigamorph.se

almost 17 years ago

In reply to: Jan-Ivar Mellingen (#1)

Re: Huge speed penalty using <>TRUE instead of =FALSE

It might be that your column may be NULL as well as TRUE or FALSE. I
am no expert in this matter though.

On Fri, Jul 17, 2009 at 10:12 AM, Jan-Ivar
Mellingen<jan-ivar.mellingen@alreg.no> wrote:

Show quoted text

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

The problematic query looks like this:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
Alarm_status='X' ORDER BY ID DESC

If it is changed to this it works as expected:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
Alarm_status='X' ORDER BY ID DESC

After investigation (on a smaller dataset on my own database) I found
that the query was resulting in a sequential scan:

"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
time=837.793..837.793 rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
(actual time=837.782..837.782 rows=0 loops=1)"
" Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 837.896 ms"

The modified query gave this result:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
(cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
loops=1)"
" Index Cond: (logg_avsluttet = false)"
" Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 0.123 ms"

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.

Some relevant details from the table definition:
CREATE TABLE alarmlogg
(
id serial NOT NULL,
alarm_status character varying(1) DEFAULT ''::character varying,
logg_avsluttet boolean DEFAULT false,
...
CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
)

CREATE INDEX i_alarmlogg_alarm_status
ON alarmlogg
USING btree
(alarm_status);

CREATE INDEX i_alarmlogg_logg_avsluttet
ON alarmlogg
USING btree
(logg_avsluttet);

Regards,
Jan-Ivar Mellingen
Securinet AS

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Peter Eisentraut

peter_e@gmx.net

almost 17 years ago

In reply to: Jan-Ivar Mellingen (#1)

Re: Huge speed penalty using <>TRUE instead of =FALSE

On Friday 17 July 2009 11:12:41 Jan-Ivar Mellingen wrote:

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

The planner just isn't that smart. The boolean type is a special case where
<> some_value implies = some_other_value, but this doesn't generalize well to
other data types. And the planner doesn't have a whole lot of data type
specific knowledge.

I think a better index definition might actually be on alarm_status, with a
partial index predicate on logg_avsluttet = false.

Peter Eisentraut

peter_e@gmx.net

almost 17 years ago

In reply to: Mikael Krantz (#2)

Re: Huge speed penalty using <>TRUE instead of =FALSE

On Friday 17 July 2009 12:45:47 Mikael Krantz wrote:

It might be that your column may be NULL as well as TRUE or FALSE. I
am no expert in this matter though.

Nulls also need to be considered when attempting to substitute purportedly
equivalent clauses. But in this case it wouldn't actually matter, because

WHERE foo <> TRUE

and

WHERE foo = false

would both omit the row if foo is null. Both expressions only return true if
foo has the value "false". But again, this is data type specific knowledge.

Tom Lane

tgl@sss.pgh.pa.us

almost 17 years ago

In reply to: Jan-Ivar Mellingen (#1)

Re: Huge speed penalty using <>TRUE instead of =FALSE

Jan-Ivar Mellingen <jan-ivar.mellingen@alreg.no> writes:

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

This is not a bug. The set of operators that are indexable is well
documented, and <> is not one of them.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 17 years ago

In reply to: Peter Eisentraut (#4)

Re: Huge speed penalty using <>TRUE instead of =FALSE

Peter Eisentraut <peter_e@gmx.net> writes:

... But again, this is data type specific knowledge.

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. Not sure if it's
worth the trouble though; how many people really write such things?

If you really wanted to take it to extremes, you could also reduce
cases like "x > false", but that's starting to get a bit silly.

regards, tom lane

Bruce Momjian

bruce@momjian.us

almost 17 years ago

In reply to: Tom Lane (#6)

Re: Huge speed penalty using <>TRUE instead of =FALSE

I don't think it even has to be so specific. We should just always
rewrite bool <> bool into bool = NOT bool.

Hmm. That only has a 50/50 chance of creating an indexable clause.
Perhaps we could even rewrite it as "a = NOT b AND NOT a = b".

--
Greg

On 2009-07-17, at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Peter Eisentraut <peter_e@gmx.net> writes:

... But again, this is data type specific knowledge.

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). It would take just a few more lines of
code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. Not sure if it's
worth the trouble though; how many people really write such things?

If you really wanted to take it to extremes, you could also reduce
cases like "x > false", but that's starting to get a bit silly.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Jan-Ivar Mellingen

jan-ivar.mellingen@alreg.no

almost 17 years ago

In reply to: Jan-Ivar Mellingen (#1)

Re: Huge speed penalty using <>TRUE instead of =FALSE

Jan-Ivar Mellingen skrev:

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

The problematic query looks like this:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
Alarm_status='X' ORDER BY ID DESC

If it is changed to this it works as expected:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
Alarm_status='X' ORDER BY ID DESC

After investigation (on a smaller dataset on my own database) I found
that the query was resulting in a sequential scan:

"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
time=837.793..837.793 rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
(actual time=837.782..837.782 rows=0 loops=1)"
" Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 837.896 ms"

The modified query gave this result:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
(cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
loops=1)"
" Index Cond: (logg_avsluttet = false)"
" Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 0.123 ms"

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.

Some relevant details from the table definition:
CREATE TABLE alarmlogg
(
id serial NOT NULL,
alarm_status character varying(1) DEFAULT ''::character varying,
logg_avsluttet boolean DEFAULT false,
...
CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
)

CREATE INDEX i_alarmlogg_alarm_status
ON alarmlogg
USING btree
(alarm_status);

CREATE INDEX i_alarmlogg_logg_avsluttet
ON alarmlogg
USING btree
(logg_avsluttet);

Regards,
Jan-Ivar Mellingen
Securinet AS

Thanks to all who answered my question and helped me discover that I
forgot to consider the NULL values.
After a cup of coffee and a little bit of thinking it became clear that
<>TRUE is not the same as FALSE, and the NULLS are not in the index.
PostgreSQL is a great database, but it does not hurt to think a little
when using it...
Thank You all!

/Jan-Ivar

Robert Haas

robertmhaas@gmail.com

almost 17 years ago

In reply to: Tom Lane (#6)

Re: Huge speed penalty using <>TRUE instead of =FALSE

On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

... But again, this is data type specific knowledge.

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. Not sure if it's
worth the trouble though; how many people really write such things?

I don't know, but there's probably somebody. I probably did it myself
a few times, when I was just starting out. If it's easy, it seems
worth doing. The problem with these things is that no matter how lame
it seems to do whatever-it-is, the pain when someone does is really
large... so adding a little bit of code to avoid that seems
worthwhile, at least to me.

If you really wanted to take it to extremes, you could also reduce
cases like "x > false", but that's starting to get a bit silly.

Probably that one is beyond even my tolerance.

...Robert

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 17 years ago

In reply to: Robert Haas (#9)

Re: Huge speed penalty using <>TRUE instead of =FALSE

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). �It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. �Not sure if it's
worth the trouble though; how many people really write such things?

I don't know, but there's probably somebody. I probably did it myself
a few times, when I was just starting out. If it's easy, it seems
worth doing.

http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php

regards, tom lane

#11

Robert Haas

robertmhaas@gmail.com

almost 17 years ago

In reply to: Tom Lane (#10)

Re: Huge speed penalty using <>TRUE instead of =FALSE

On Mon, Aug 10, 2009 at 11:10 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. Not sure if it's
worth the trouble though; how many people really write such things?

I don't know, but there's probably somebody. I probably did it myself
a few times, when I was just starting out. If it's easy, it seems
worth doing.

http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php

regards, tom lane

Oh, cool. Sorry, I missed the fact that that email was almost a month old.

...Robert