indexing with lower(...) -> queries are not optimised very well - Please Help

Started by Martin Hamplover 22 years ago8 messagesgeneral
Jump to latest
#1Martin Hampl
Martin.Hampl@gmx.de

Hi,

I am using PostgreSQL 7.4, but I did have the same problem with the
last version.

I indexed the column word (defined as varchar(64)) using lower(word).
If I use the following query, everything is fine, the index is used and
the query is executed very quickly:

select * from token where lower(word) = 'saxophone';

However, with EXPLAIN you get the following:

QUERY PLAN
------------------------------------------------------------------------
----------------
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

I indexed the same column without the use of lower(...). Now

explain select * from token where word = 'saxophone';

results in:
QUERY PLAN
------------------------------------------------------------------------
-----
Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676
width=16)
Index Cond: ((word)::text = 'saxophone'::text)

Please note the difference in the estimated cost! Why is there such a
huge difference? Both queries almost exactly need the same time to
execute (all instances of 'saxophone' in the table are lower-case (this
is a coincidence)).

The Problem is, if I use this query as part of a more complicated query
the optimiser chooses a *very* bad query plan.

Please help me. What am I doing wrong? I would appreciate any help an
this very much.

Regards,
Martin.

#2CoL
col@mportal.hu
In reply to: Martin Hampl (#1)
Re: indexing with lower(...) -> queries are not optimised very well

hi,

Martin Hampl wrote, On 11/18/2003 7:24 PM:

Hi,

I am using PostgreSQL 7.4, but I did have the same problem with the
last version.

I indexed the column word (defined as varchar(64)) using lower(word).
If I use the following query, everything is fine, the index is used and
the query is executed very quickly:

select * from token where lower(word) = 'saxophone';

However, with EXPLAIN you get the following:

QUERY PLAN
------------------------------------------------------------------------
----------------
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

I indexed the same column without the use of lower(...). Now

explain select * from token where word = 'saxophone';

results in:
QUERY PLAN
------------------------------------------------------------------------
-----
Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676
width=16)
Index Cond: ((word)::text = 'saxophone'::text)

Please note the difference in the estimated cost! Why is there such a
huge difference? Both queries almost exactly need the same time to
execute (all instances of 'saxophone' in the table are lower-case (this
is a coincidence)).

And after analyze token; ?

C.

#3Martin Hampl
Martin.Hampl@gmx.de
In reply to: CoL (#2)
Re: indexing with lower(...) -> queries are not optimised very well

Hi,

hi,

Martin Hampl wrote, On 11/18/2003 7:24 PM:

Hi,
I am using PostgreSQL 7.4, but I did have the same problem with the
last version.
I indexed the column word (defined as varchar(64)) using lower(word).
If I use the following query, everything is fine, the index is used
and the query is executed very quickly:
select * from token where lower(word) = 'saxophone';
However, with EXPLAIN you get the following:
QUERY PLAN
----------------------------------------------------------------------
-- ----------------
Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)
I indexed the same column without the use of lower(...). Now
explain select * from token where word = 'saxophone';
results in:
QUERY PLAN
----------------------------------------------------------------------
-- -----
Index Scan using word_idx on token (cost=0.00..6579.99 rows=1676
width=16)
Index Cond: ((word)::text = 'saxophone'::text)
Please note the difference in the estimated cost! Why is there such a
huge difference? Both queries almost exactly need the same time to
execute (all instances of 'saxophone' in the table are lower-case
(this is a coincidence)).

And after analyze token; ?

No, doesn't work (I tried that of course). But this might be the
problem: how to analyse properly for the use of an index with
lower(...).

Thanks for the answer,
Martin.

Show quoted text

C.

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martin Hampl (#1)
Re: indexing with lower(...) -> queries are not optimised very well - Please Help

Martin Hampl <Martin.Hampl@gmx.de> writes:

Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO list.

regards, tom lane

#5Martin Hampl
Martin.Hampl@gmx.de
In reply to: Tom Lane (#4)
Re: indexing with lower(...) -> queries are not optimised very well - Please Help

Am 21.11.2003 um 06:54 schrieb Tom Lane:

Martin Hampl <Martin.Hampl@gmx.de> writes:

Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO
list.

Thanks a lot for your answer.

Any idea about when this situation will be improved? Until then I have
to find a work around... any suggestions?

Regards,
Martin.

#6Martin Hampl
Martin.Hampl@gmx.de
In reply to: Tom Lane (#4)
Re: indexing with lower(...) -> queries are not optimised very well - Please Help

Hi,

Am 21.11.2003 um 06:54 schrieb Tom Lane:

Martin Hampl <Martin.Hampl@gmx.de> writes:

Index Scan using word_lower_idx on token (cost=0.00..98814.08
rows=25382 width=16)
Index Cond: (lower((word)::text) = 'saxophone'::text)

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO
list.

Any ideas when this will work? Is it difficult to implement?

(For those who don't recall the context: I asked about indexing lower
values of a varchar-coloumn ("create index xy_idx on
table(lower(coloumn));") and how the query planner uses this index).

Regards,
Martin.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martin Hampl (#6)
Re: indexing with lower(...) -> queries are not optimised very well - Please Help

Martin Hampl <Martin.Hampl@gmx.de> writes:

Am 21.11.2003 um 06:54 schrieb Tom Lane:

[ bad plan for use of a functional index ]

The rows estimate (and therefore also the cost estimate) is a complete
guess in this situation, because the system keeps no statistics about
the values of lower(word). Improving this situation is on the TODO
list.

Any ideas when this will work? Is it difficult to implement?

It strikes me as a small-but-not-trivial project. Possibly someone will
get it done for 7.5. You can find some discussion in the pghackers
archives, IIRC (look for threads about keeping statistics on functional
indexes).

This brings up a thought for Mark Cave-Ayland's project of breaking out
the datatype dependencies in ANALYZE: it would be wise to ensure that
the API for examine_attribute doesn't depend too much on the assumption
that the value(s) being analyzed are part of the relation proper. They
might be coming from a functional index, or even more likely being
computed on-the-fly based on the definition of a functional index.
Not sure what we'd want to change exactly, but it's something to think
about before the API gets set in stone.

regards, tom lane

#8Mark Cave-Ayland
m.cave-ayland@webbased.co.uk
In reply to: Tom Lane (#7)
Re: indexing with lower(...) -> queries are not optimised very well - Please Help

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 16 January 2004 21:50
To: Martin Hampl
Cc: pgsql-general@postgresql.org; Mark Cave-Ayland
Subject: Re: [GENERAL] indexing with lower(...) -> queries
are not optimised very well - Please Help

Martin Hampl <Martin.Hampl@gmx.de> writes:

Am 21.11.2003 um 06:54 schrieb Tom Lane:

[ bad plan for use of a functional index ]

The rows estimate (and therefore also the cost estimate) is a
complete guess in this situation, because the system keeps no
statistics about the values of lower(word). Improving

this situation

is on the TODO list.

Any ideas when this will work? Is it difficult to implement?

It strikes me as a small-but-not-trivial project. Possibly
someone will get it done for 7.5. You can find some
discussion in the pghackers archives, IIRC (look for threads
about keeping statistics on functional indexes).

This brings up a thought for Mark Cave-Ayland's project of
breaking out the datatype dependencies in ANALYZE: it would
be wise to ensure that the API for examine_attribute doesn't
depend too much on the assumption that the value(s) being
analyzed are part of the relation proper. They might be
coming from a functional index, or even more likely being
computed on-the-fly based on the definition of a functional
index. Not sure what we'd want to change exactly, but it's
something to think about before the API gets set in stone.

regards, tom lane

Hi Tom/Martin,

I'm currently about 2/3rds of the way through writing my patch so things
are very nearly where I want them to be. In its current form it is just
the reorganisation and extension of what is already there, so I haven't
looked at the implications of functional indexes. It would be great if
the completed and checked patch can be applied so at least it gets out
there - then other developers can look at how this works for things like
functional indexes and put in the API changes required before the 7.5
feature freeze.

Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.