Optimizer & boolean syntax

Started by Daniele Orlandiover 23 years ago16 messageshackers

Jump to latest

Daniele Orlandi

daniele@orlandi.com

over 23 years ago

Are those two syntaxes eqivalent ?

select * from users where monitored;
select * from users where monitored=true;

If the answer is yes, the optimimer probably doesn't agree with you :)

Tested on RC1:

template1=# create table a (a boolean, b text);
CREATE TABLE

.... inserted ~18000 rows with just one true (just to make an index scan
meaningful)....

template1=# vacuum analyze a;
VACUUM
template1=# explain select * from a where a;
QUERY PLAN
----------------------------------------------------
Seq Scan on a (cost=0.00..802.64 rows=1 width=11)
Filter: a
(2 rows)

template1=# explain select * from a where a=true;
QUERY PLAN
--------------------------------------------------------------
Index Scan using a_a on a (cost=0.00..2.01 rows=1 width=11)
Index Cond: (a = true)
(2 rows)

Bye!

--
Daniele Orlandi
Planet Srl

Robert Treat

xzilla@users.sourceforge.net

over 23 years ago

In reply to: Daniele Orlandi (#1)

Re: Optimizer & boolean syntax

Using the famous WAG tech, in your first query the optimizer has to
evaluate monitored for each record to determine its value.

Robert Treat

Show quoted text

On Thu, 2002-11-21 at 13:39, Daniele Orlandi wrote:

Are those two syntaxes eqivalent ?

select * from users where monitored;
select * from users where monitored=true;

If the answer is yes, the optimimer probably doesn't agree with you :)

Tested on RC1:

template1=# create table a (a boolean, b text);
CREATE TABLE

.... inserted ~18000 rows with just one true (just to make an index scan
meaningful)....

template1=# vacuum analyze a;
VACUUM
template1=# explain select * from a where a;
QUERY PLAN
----------------------------------------------------
Seq Scan on a (cost=0.00..802.64 rows=1 width=11)
Filter: a
(2 rows)

template1=# explain select * from a where a=true;
QUERY PLAN
--------------------------------------------------------------
Index Scan using a_a on a (cost=0.00..2.01 rows=1 width=11)
Index Cond: (a = true)
(2 rows)

Bye!

--
Daniele Orlandi
Planet Srl

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 23 years ago

In reply to: Daniele Orlandi (#1)

Re: Optimizer & boolean syntax

On Thu, 21 Nov 2002, Daniele Orlandi wrote:

Are those two syntaxes eqivalent ?

select * from users where monitored;
select * from users where monitored=true;

If the answer is yes, the optimimer probably doesn't agree with you :)

That depends on the definition of equivalent. They presumably give the
same answer (I'm assuming monitored is a boolean), but the latter has
something that's considered an indexable condition and I believe the
former does not (even with enable_seqscan=off the former syntax
appears to give a sequence scan, usually a good sign it's not considered
indexable).

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 23 years ago

In reply to: Stephan Szabo (#3)

Re: Optimizer & boolean syntax

Are those two syntaxes eqivalent ?

select * from users where monitored;
select * from users where monitored=true;

If the answer is yes, the optimimer probably doesn't agree with you :)

That depends on the definition of equivalent. They presumably give the
same answer (I'm assuming monitored is a boolean), but the latter has
something that's considered an indexable condition and I believe the
former does not (even with enable_seqscan=off the former syntax
appears to give a sequence scan, usually a good sign it's not considered
indexable).

I think his point is that they _should_ be equivalent. Surely there's
something in the optimiser that discards '=true' stuff, like 'a=a' should be
discarded?

Chris

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 23 years ago

In reply to: Christopher Kings-Lynne (#4)

Re: Optimizer & boolean syntax

On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

Are those two syntaxes eqivalent ?

select * from users where monitored;
select * from users where monitored=true;

If the answer is yes, the optimimer probably doesn't agree with you :)

That depends on the definition of equivalent. They presumably give the
same answer (I'm assuming monitored is a boolean), but the latter has
something that's considered an indexable condition and I believe the
former does not (even with enable_seqscan=off the former syntax
appears to give a sequence scan, usually a good sign it's not considered
indexable).

I think his point is that they _should_ be equivalent. Surely there's
something in the optimiser that discards '=true' stuff, like 'a=a' should be
discarded?

I figure that's what he meant, but it isn't what was said. ;)

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c. I'm not sure what the best way to make it work would be given
that presumably we'd want to make col IS TRUE/FALSE use an index at the
same time (since that appears to not do so as well).

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 23 years ago

In reply to: Stephan Szabo (#5)

Re: Optimizer & boolean syntax

I think his point is that they _should_ be equivalent. Surely there's
something in the optimiser that discards '=true' stuff, like 'a=a'

should be

discarded?

I figure that's what he meant, but it isn't what was said. ;)

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c. I'm not sure what the best way to make it work would be given
that presumably we'd want to make col IS TRUE/FALSE use an index at the
same time (since that appears to not do so as well).

Not that I see the point of indexing booleans, but hey :)

Chris

Alvaro Herrera

alvherre@dcc.uchile.cl

over 23 years ago

In reply to: Christopher Kings-Lynne (#6)

Re: Optimizer & boolean syntax

On Thu, Nov 21, 2002 at 02:45:34PM -0800, Christopher Kings-Lynne wrote:

I think his point is that they _should_ be equivalent. Surely there's
something in the optimiser that discards '=true' stuff, like 'a=a'

should be

discarded?

Not that I see the point of indexing booleans, but hey :)

If one of the values is much more infrequent than the other, you can
probably get a substantial win using a partial index, can't you?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."

scott.marlowe

scott.marlowe@ihs.com

over 23 years ago

In reply to: Christopher Kings-Lynne (#6)

Re: Optimizer & boolean syntax

On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

I think his point is that they _should_ be equivalent. Surely there's
something in the optimiser that discards '=true' stuff, like 'a=a'

should be

discarded?

I figure that's what he meant, but it isn't what was said. ;)

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c. I'm not sure what the best way to make it work would be given
that presumably we'd want to make col IS TRUE/FALSE use an index at the
same time (since that appears to not do so as well).

Not that I see the point of indexing booleans, but hey :)

While full indexes do seem futile, partial indexes can be quite useful.

select articles from forum where approved is false

if 99.9% of all articles are approved would be quite common.

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 23 years ago

In reply to: Stephan Szabo (#5)

Re: Optimizer & boolean syntax

Not that I see the point of indexing booleans, but hey :)

If one of the values is much more infrequent than the other, you can
probably get a substantial win using a partial index, can't you?

Yes, I thought of the partial index after I wrote that email :)

Chris

#10

scott.marlowe

scott.marlowe@ihs.com

over 23 years ago

In reply to: Christopher Kings-Lynne (#6)

Re: Optimizer & boolean syntax

On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

I think his point is that they _should_ be equivalent. Surely there's
something in the optimiser that discards '=true' stuff, like 'a=a'

should be

discarded?

I figure that's what he meant, but it isn't what was said. ;)

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c. I'm not sure what the best way to make it work would be given
that presumably we'd want to make col IS TRUE/FALSE use an index at the
same time (since that appears to not do so as well).

Not that I see the point of indexing booleans, but hey :)

also, in reference to my last message, even if the % was 50/50, if the
table was such that the bool was in a table next to a text field with 20k
or text in it, an index on the bool would be much faster to go through
than to seq scan the table.

#11

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 23 years ago

In reply to: scott.marlowe (#10)

Re: Optimizer & boolean syntax

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c. I'm not sure what the best way to make it work would be

given

that presumably we'd want to make col IS TRUE/FALSE use an index at

the

same time (since that appears to not do so as well).

Not that I see the point of indexing booleans, but hey :)

also, in reference to my last message, even if the % was 50/50, if the
table was such that the bool was in a table next to a text field with 20k
or text in it, an index on the bool would be much faster to go through
than to seq scan the table.

Hmmm...I'm not sure about that. Postgres's storage strategry with text will
be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and
it will only be retrieved if it's in the select parameters.

Chris

#12

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 23 years ago

In reply to: Christopher Kings-Lynne (#11)

Re: Optimizer & boolean syntax

On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c. I'm not sure what the best way to make it work would be

given

that presumably we'd want to make col IS TRUE/FALSE use an index at

the

same time (since that appears to not do so as well).

Not that I see the point of indexing booleans, but hey :)

also, in reference to my last message, even if the % was 50/50, if the
table was such that the bool was in a table next to a text field with 20k
or text in it, an index on the bool would be much faster to go through
than to seq scan the table.

Hmmm...I'm not sure about that. Postgres's storage strategry with text will
be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and
it will only be retrieved if it's in the select parameters.

True, but replace that text with 1500 integers. :)

The only problem with the partial index solution is that it seems to still
only work for the same method of asking for the result, so if you make an
index where col=true, using col IS TRUE or col in a query doesn't seem to
use it.

#13

Daniele Orlandi

daniele@orlandi.com

over 23 years ago

In reply to: Stephan Szabo (#3)

Re: Optimizer & boolean syntax

Stephan Szabo wrote:

On Thu, 21 Nov 2002, Daniele Orlandi wrote:

Are those two syntaxes eqivalent ?

select * from users where monitored;
select * from users where monitored=true;

If the answer is yes, the optimimer probably doesn't agree with you :)

That depends on the definition of equivalent.

By equivalent I mean "means the same thing so, behaves in the same way".

I consider the former syntax to be cleaner and I would tend to use it
most of times.

For what concerns partial indexes, I agree, it's a better approach for
this kind of indexing and I did some test:

-------------------------
ctonet=# create index users_monitored on users (monitored) where monitored;
CREATE
ctonet=# explain select * from users where monitored;
NOTICE: QUERY PLAN:

Index Scan using users_monitored on users (cost=0.00..9.44 rows=6
width=186)

EXPLAIN

Nice, it appears to use the index, but:

ctonet=# explain select * from users where monitored=true;
NOTICE: QUERY PLAN:

Seq Scan on users (cost=0.00..8298.84 rows=59 width=186)

EXPLAIN
-------------------------

The problem is the opposite... so, effectively, seems that the optimizer
considers "monitored" and "monitored=true" as two different expressions...

The viceversa is analog and we also can see that the syntax "monitored
is true" is considered different from the other two syntaxes:

-----------------------
ctonet=# drop index users_monitored;
DROP
ctonet=# create index users_monitored on users (monitored) where
monitored=true;
CREATE
ctonet=# explain select * from users where monitored=true;
NOTICE: QUERY PLAN:

Index Scan using users_monitored on users (cost=0.00..9.45 rows=6
width=186)

EXPLAIN
ctonet=# explain select * from users where monitored;
NOTICE: QUERY PLAN:

Seq Scan on users (cost=0.00..8077.07 rows=59 width=186)

EXPLAIN

ctonet=# create index users_monitored on users (monitored) where
monitored=true;
CREATE
ctonet=# explain select * from users where monitored is true;
NOTICE: QUERY PLAN:

Seq Scan on users (cost=0.00..8077.07 rows=59 width=186)

EXPLAIN
-------------------------

What I propose is that all those syntaxes are made equivalent (by, for
example, rewriting boolean comparisons to a common form) in order to
have a more consistent index usage.

Bye!

--
Daniele Orlandi
Planet Srl

#14

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Daniele Orlandi (#13)

Re: Optimizer & boolean syntax

Daniele Orlandi <daniele@orlandi.com> writes:

The problem is the opposite... so, effectively, seems that the optimizer
considers "monitored" and "monitored=true" as two different expressions...

Check.

The viceversa is analog and we also can see that the syntax "monitored
is true" is considered different from the other two syntaxes:

As it should be.

What I propose is that all those syntaxes are made equivalent

Only two of them are logically equivalent. Consider NULL.

Even for the first two, assuming equivalence requires hard-wiring an
assumption about the behavior of the "bool = bool" operator; which is
a user-redefinable operator. I'm not totally comfortable with the idea.

regards, tom lane

#15

scott.marlowe

scott.marlowe@ihs.com

over 23 years ago

In reply to: Stephan Szabo (#12)

Re: Optimizer & boolean syntax

On Thu, 21 Nov 2002, Stephan Szabo wrote:

On Thu, 21 Nov 2002, Christopher Kings-Lynne wrote:

"col" isn't of the general form "indexkey op constant" or "constant op
indexkey" which I presume it's looking for given the comments in
indxpath.c. I'm not sure what the best way to make it work would be

given

that presumably we'd want to make col IS TRUE/FALSE use an index at

the

same time (since that appears to not do so as well).

Not that I see the point of indexing booleans, but hey :)

also, in reference to my last message, even if the % was 50/50, if the
table was such that the bool was in a table next to a text field with 20k
or text in it, an index on the bool would be much faster to go through
than to seq scan the table.

Hmmm...I'm not sure about that. Postgres's storage strategry with text will
be to keep it in a side table (or you can use ALTER TABLE/SET STORAGE) and
it will only be retrieved if it's in the select parameters.

True, but replace that text with 1500 integers. :)

The only problem with the partial index solution is that it seems to still
only work for the same method of asking for the result, so if you make an
index where col=true, using col IS TRUE or col in a query doesn't seem to
use it.

True. I always use the syntax:

select * from table where field IS TRUE

OR IS FALSE for consistency.

#16

Daniele Orlandi

daniele@orlandi.com

over 23 years ago

In reply to: Stephan Szabo (#3)

Re: Optimizer & boolean syntax

Tom Lane wrote:

Only two of them are logically equivalent. Consider NULL.

Ohhh IS NOT TRUE or IS NOT FALSE also match NULL, I never knew this :)

Even for the first two, assuming equivalence requires hard-wiring an
assumption about the behavior of the "bool = bool" operator; which is
a user-redefinable operator.
I'm not totally comfortable with the idea.

Ok, I see your point and the problems that may arise, but I hope wou
will agree with me that from the point of view of the user, both clauses
have the same meaning and the index usage should be consistant with it.

Unfortunatelly I don't know very well PostgreSQL internals, so I may be
saying a load of bullshits, but wouldn't be possible to consider any
evaluation of a bool expression in the form of bool=bool with true as
the second 'bool'[1]Eventually including the "var IS TRUE" and "var IS FALSE" (not var IS NOT ...) which already are special syntax cases if I am not wrong. ? At least as a TODO item ?

Thanks!
Bye!

[1]: Eventually including the "var IS TRUE" and "var IS FALSE" (not var IS NOT ...) which already are special syntax cases if I am not wrong.
IS NOT ...) which already are special syntax cases if I am not wrong.

--
Daniele Orlandi
Planet Srl