Huge speed penalty using <>TRUE instead of =FALSE

Started by Jan-Ivar Mellingenover 16 years ago11 messagesbugs
Jump to latest
#1Jan-Ivar Mellingen
jan-ivar.mellingen@alreg.no

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

The problematic query looks like this:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
Alarm_status='X' ORDER BY ID DESC

If it is changed to this it works as expected:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
Alarm_status='X' ORDER BY ID DESC

After investigation (on a smaller dataset on my own database) I found
that the query was resulting in a sequential scan:

"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
time=837.793..837.793 rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
(actual time=837.782..837.782 rows=0 loops=1)"
" Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 837.896 ms"

The modified query gave this result:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
(cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
loops=1)"
" Index Cond: (logg_avsluttet = false)"
" Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 0.123 ms"

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.

Some relevant details from the table definition:
CREATE TABLE alarmlogg
(
id serial NOT NULL,
alarm_status character varying(1) DEFAULT ''::character varying,
logg_avsluttet boolean DEFAULT false,
...
CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
)

CREATE INDEX i_alarmlogg_alarm_status
ON alarmlogg
USING btree
(alarm_status);

CREATE INDEX i_alarmlogg_logg_avsluttet
ON alarmlogg
USING btree
(logg_avsluttet);

Regards,
Jan-Ivar Mellingen
Securinet AS

#2Mikael Krantz
mk@zigamorph.se
In reply to: Jan-Ivar Mellingen (#1)
Re: Huge speed penalty using <>TRUE instead of =FALSE

It might be that your column may be NULL as well as TRUE or FALSE. I
am no expert in this matter though.

/M

On Fri, Jul 17, 2009 at 10:12 AM, Jan-Ivar
Mellingen<jan-ivar.mellingen@alreg.no> wrote:

Show quoted text

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

The problematic query looks like this:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
Alarm_status='X' ORDER BY ID DESC

If it is changed to this it works as expected:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
Alarm_status='X' ORDER BY ID DESC

After investigation (on a smaller dataset on my own database) I found
that the query was resulting in a sequential scan:

"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort  (cost=49936.96..49936.96 rows=1 width=405) (actual
time=837.793..837.793 rows=0 loops=1)"
"  Sort Key: id"
"  Sort Method:  quicksort  Memory: 17kB"
"  ->  Seq Scan on alarmlogg  (cost=0.00..49936.95 rows=1 width=405)
(actual time=837.782..837.782 rows=0 loops=1)"
"        Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 837.896 ms"

The modified query gave this result:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort  (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
rows=0 loops=1)"
"  Sort Key: id"
"  Sort Method:  quicksort  Memory: 17kB"
"  ->  Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
(cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
loops=1)"
"        Index Cond: (logg_avsluttet = false)"
"        Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 0.123 ms"

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.

Some relevant details from the table definition:
CREATE TABLE alarmlogg
(
  id serial NOT NULL,
  alarm_status character varying(1) DEFAULT ''::character varying,
  logg_avsluttet boolean DEFAULT false,
  ...
  CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
)

CREATE INDEX i_alarmlogg_alarm_status
 ON alarmlogg
 USING btree
 (alarm_status);

CREATE INDEX i_alarmlogg_logg_avsluttet
 ON alarmlogg
 USING btree
 (logg_avsluttet);

Regards,
Jan-Ivar Mellingen
Securinet AS

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Jan-Ivar Mellingen (#1)
Re: Huge speed penalty using <>TRUE instead of =FALSE

On Friday 17 July 2009 11:12:41 Jan-Ivar Mellingen wrote:

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

The planner just isn't that smart. The boolean type is a special case where
<> some_value implies = some_other_value, but this doesn't generalize well to
other data types. And the planner doesn't have a whole lot of data type
specific knowledge.

I think a better index definition might actually be on alarm_status, with a
partial index predicate on logg_avsluttet = false.

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Mikael Krantz (#2)
Re: Huge speed penalty using <>TRUE instead of =FALSE

On Friday 17 July 2009 12:45:47 Mikael Krantz wrote:

It might be that your column may be NULL as well as TRUE or FALSE. I
am no expert in this matter though.

Nulls also need to be considered when attempting to substitute purportedly
equivalent clauses. But in this case it wouldn't actually matter, because

WHERE foo <> TRUE

and

WHERE foo = false

would both omit the row if foo is null. Both expressions only return true if
foo has the value "false". But again, this is data type specific knowledge.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan-Ivar Mellingen (#1)
Re: Huge speed penalty using <>TRUE instead of =FALSE

Jan-Ivar Mellingen <jan-ivar.mellingen@alreg.no> writes:

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

This is not a bug. The set of operators that are indexable is well
documented, and <> is not one of them.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#4)
Re: Huge speed penalty using <>TRUE instead of =FALSE

Peter Eisentraut <peter_e@gmx.net> writes:

... But again, this is data type specific knowledge.

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. Not sure if it's
worth the trouble though; how many people really write such things?

If you really wanted to take it to extremes, you could also reduce
cases like "x > false", but that's starting to get a bit silly.

regards, tom lane

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: Huge speed penalty using <>TRUE instead of =FALSE

I don't think it even has to be so specific. We should just always
rewrite bool <> bool into bool = NOT bool.

Hmm. That only has a 50/50 chance of creating an indexable clause.
Perhaps we could even rewrite it as "a = NOT b AND NOT a = b".

--
Greg

On 2009-07-17, at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Peter Eisentraut <peter_e@gmx.net> writes:

... But again, this is data type specific knowledge.

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). It would take just a few more lines of
code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. Not sure if it's
worth the trouble though; how many people really write such things?

If you really wanted to take it to extremes, you could also reduce
cases like "x > false", but that's starting to get a bit silly.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Jan-Ivar Mellingen
jan-ivar.mellingen@alreg.no
In reply to: Jan-Ivar Mellingen (#1)
Re: Huge speed penalty using <>TRUE instead of =FALSE

Jan-Ivar Mellingen skrev:

One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!

The problematic query looks like this:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
Alarm_status='X' ORDER BY ID DESC

If it is changed to this it works as expected:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
Alarm_status='X' ORDER BY ID DESC

After investigation (on a smaller dataset on my own database) I found
that the query was resulting in a sequential scan:

"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
time=837.793..837.793 rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
(actual time=837.782..837.782 rows=0 loops=1)"
" Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 837.896 ms"

The modified query gave this result:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
(cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
loops=1)"
" Index Cond: (logg_avsluttet = false)"
" Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 0.123 ms"

This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?

This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.

Some relevant details from the table definition:
CREATE TABLE alarmlogg
(
id serial NOT NULL,
alarm_status character varying(1) DEFAULT ''::character varying,
logg_avsluttet boolean DEFAULT false,
...
CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
)

CREATE INDEX i_alarmlogg_alarm_status
ON alarmlogg
USING btree
(alarm_status);

CREATE INDEX i_alarmlogg_logg_avsluttet
ON alarmlogg
USING btree
(logg_avsluttet);

Regards,
Jan-Ivar Mellingen
Securinet AS

Thanks to all who answered my question and helped me discover that I
forgot to consider the NULL values.
After a cup of coffee and a little bit of thinking it became clear that
<>TRUE is not the same as FALSE, and the NULLS are not in the index.
PostgreSQL is a great database, but it does not hurt to think a little
when using it...
Thank You all!

/Jan-Ivar

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: Huge speed penalty using <>TRUE instead of =FALSE

On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

... But again, this is data type specific knowledge.

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality).  It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs.  Not sure if it's
worth the trouble though; how many people really write such things?

I don't know, but there's probably somebody. I probably did it myself
a few times, when I was just starting out. If it's easy, it seems
worth doing. The problem with these things is that no matter how lame
it seems to do whatever-it-is, the pain when someone does is really
large... so adding a little bit of code to avoid that seems
worthwhile, at least to me.

If you really wanted to take it to extremes, you could also reduce
cases like "x > false", but that's starting to get a bit silly.

Probably that one is beyond even my tolerance.

...Robert

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: Huge speed penalty using <>TRUE instead of =FALSE

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). �It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. �Not sure if it's
worth the trouble though; how many people really write such things?

I don't know, but there's probably somebody. I probably did it myself
a few times, when I was just starting out. If it's easy, it seems
worth doing.

http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php

regards, tom lane

#11Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: Huge speed penalty using <>TRUE instead of =FALSE

On Mon, Aug 10, 2009 at 11:10 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality).  It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs.  Not sure if it's
worth the trouble though; how many people really write such things?

I don't know, but there's probably somebody.  I probably did it myself
a few times, when I was just starting out.  If it's easy, it seems
worth doing.

http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php

                       regards, tom lane

Oh, cool. Sorry, I missed the fact that that email was almost a month old.

...Robert