Another seq scan instead of index question

Started by Nicholas Piperover 24 years ago5 messagesgeneral
Jump to latest
#1Nicholas Piper
nick@nickpiper.co.uk

Hi everyone,

I've a table:
Table "cdsongs"
Attribute | Type | Modifier

----------------+-------------------+-------------------------------------
songid | integer | default
nextval('seq_songid'::text)
cdid | integer |
track | integer |
song | character varying |
extt | text |
fk_products_id | integer |
Indices: cdsongs_cdid,
cdsongs_songid,
idx_cdsongs_song

wich an index :
CREATE INDEX idx_cdsongs_song ON cdsongs (lower(song));

But postgresql refuses to use it. The vacuum output is :

depos=# VACUUM VERBOSE ANALYZE cdsongs;
NOTICE: --Relation cdsongs--
NOTICE: Pages 41232: Changed 0, reaped 0, Empty 0, New 0; Tup
4210874: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 52, MaxLen
2025; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU
0.64s/0.72u sec.
NOTICE: Index cdsongs_cdid: Pages 9223; Tuples 4210874. CPU
0.47s/3.80u sec.
NOTICE: Index cdsongs_songid: Pages 9223; Tuples 4210874. CPU
0.46s/3.68u sec.
NOTICE: Index idx_cdsongs_song: Pages 21888; Tuples 4210874. CPU
1.24s/3.93u sec.
NOTICE: --Relation pg_toast_61094022--
NOTICE: Pages 5: Changed 0, reaped 0, Empty 0, New 0; Tup 28: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 63, MaxLen 2034; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_61094022_idx: Pages 2; Tuples 28. CPU
0.00s/0.00u sec.
NOTICE: Analyzing...

There are 4210874 rows, which is a lot compared to the expected rows
returned, so why does it still use seq scan ? (I'd expect seq scan if
it was returning loads of rows!).

depos=# explain select * from cdsongs where lower(song) like
'mushroom festival in hell';
NOTICE: QUERY PLAN:

Seq Scan on cdsongs (cost=0.00..104395.11 rows=42109 width=40)

EXPLAIN
depos=# set enable_seqscan = off;
SET VARIABLE
depos=# explain select * from cdsongs where lower(song) like
'mushroom festival in hell';
NOTICE: QUERY PLAN:

Index Scan using idx_cdsongs_song on cdsongs (cost=0.00..115549.17
rows=42109 width=40)

EXPLAIN

And indead, if I force seqscan off and perform the query, it's dead
fast. Otherwise it uses seqscan and takes a long long time.

Cheers for any help!

Nick

--
Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nicholas Piper (#1)
Re: Another seq scan instead of index question

Nicholas Piper <nick@nickpiper.co.uk> writes:

There are 4210874 rows, which is a lot compared to the expected rows
returned, so why does it still use seq scan ?

Well, no, it isn't "a lot". The row estimate is just about 1% of the
total rows, which suggests strongly that you're getting a default
selectivity estimate rather than anything real. Note also that you have
about 100 rows per disk page (4210874/41232). So it's estimating that
it will need to fetch about one row out of every page, on which basis
the indexscan looks pretty unattractive --- it can't save any I/O.

Your real problem is the bogus selectivity estimate. What version
are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd
be interested to see what you get from

select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 'cdsongs';

regards, tom lane

#3Nicholas Piper
nick@nickpiper.co.uk
In reply to: Tom Lane (#2)
Re: Another seq scan instead of index question

On Tue, 07 Aug 2001, Tom Lane wrote:

Your real problem is the bogus selectivity estimate. What version
are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd
be interested to see what you get from

I'm on 7.1
(PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4)

select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 'cdsongs';

That doesn't look good to me as it contains dodgy bytes...

attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival
---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+--------------------------------------------------------------
songid | -1 | 61094022 | 1 | 97 | 0 | 2.3748e-07 | 1 | 1 | 4210874
cdid | 4.65505e-06 | 61094022 | 2 | 97 | 0 | 2.32731e-05 | 677 | 1 | 364477
track | 0.0231229 | 61094022 | 3 | 97 | 0 | 0.086018 | 0 | 0 | 97
song | 0.000647958 | 61094022 | 4 | 1066 | 0 | 0.00319886 | | | ��f�r amca
extt | 0.890813 | 61094022 | 5 | 664 | 0.937611 | 2.9685e-05 | The Residents | | ��� ������� - ���� �� ������,
� �, ��� � ����, �� �����.

The data came from freedb, translated to postgresql by a small C
program which I didn't write.

Nick

--
Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !

#4Nicholas Piper
nick@nickpiper.co.uk
In reply to: Nicholas Piper (#3)
Re: Another seq scan instead of index question

On Tue, 07 Aug 2001, Nicholas Piper wrote:

Replying to myself with some more information.

On Tue, 07 Aug 2001, Tom Lane wrote:

select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 'cdsongs';

That doesn't look good to me as it contains dodgy bytes...

I've removed (hopefully) the nasty entries in my tables by deleting
where string is > 'zzzzzz'.

The new table which Tom said would be useful is ;

attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival
---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+---------------------------
songid | -1 | 61094022 | 1 | 97 | 0 | 2.38796e-07 | 1 | 1 | 4210874
cdid | 4.68083e-06 | 61094022 | 2 | 97 | 0 | 2.3402e-05 | 677 | 1 | 364477
track | 0.0231398 | 61094022 | 3 | 97 | 0 | 0.0860681 | 0 | 0 | 97
song | 0.000651592 | 61094022 | 4 | 1066 | 0 | 0.00321658 | | | zzo, Moderato con Allegro
extt | 0.890565 | 61094022 | 5 | 664 | 0.937465 | 2.98494e-05 | The Residents | | zur Ruh gebracht

Unfortunately I'm still getting the same results:

depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell';
NOTICE: QUERY PLAN:

Seq Scan on cdsongs (cost=0.00..103835.24 rows=41877 width=40)

EXPLAIN
depos=# set enable_seqscan = off;
SET VARIABLE
depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell';
NOTICE: QUERY PLAN:

Index Scan using idx_cdsongs_song on cdsongs (cost=0.00..114921.91 rows=41877 width=40)

EXPLAIN

--
Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nicholas Piper (#3)
Re: Another seq scan instead of index question

Nicholas Piper <nick@nickpiper.co.uk> writes:

On Tue, 07 Aug 2001, Tom Lane wrote:

Your real problem is the bogus selectivity estimate. What version
are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd
be interested to see what you get from

I'm on 7.1
(PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4)

Okay ... [ looks at code ... ] oh, I see the problem. The pattern-match
selectivity code doesn't try to deal with "lower(foo) LIKE ...":

/*
* If expression is not var op constant for
* a simple var of a real relation (no subqueries, for now),
* then punt and return a default estimate.
*/

This could probably be improved, but I don't have time to think about it
now. In the meantime you could try knocking DEFAULT_MATCH_SEL down a
little bit. (It's already been reduced to 0.005 in current sources,
in fact.) See src/backend/utils/adt/selfuncs.c.

regards, tom lane