Why does it not use the index?

Started by Philip Greerover 22 years ago11 messagesgeneral
Jump to latest
#1Philip Greer
philip@tildesoftware.com

This one is irritating. Here's some psql output:

================================================================================
dumps=# \d fal_profdel
Table "fal_profdel"
Attribute | Type | Modifier
-----------+--------------------------+----------
sid | character(4) | not null
card_num | character(19) | not null
date_del | timestamp with time zone |
filename | character varying(30) |
Indices: fal_prfdel_cn,
fal_prfdel_date,
fal_prfdel_pk

dumps=# \d fal_prfdel_cn
Index "fal_prfdel_cn"
Attribute | Type
-----------+---------------
card_num | character(19)
unique btree

dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
NOTICE: QUERY PLAN:

Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12)

EXPLAIN
================================================================================

Now, why the heck is the select query not using the index? I've tried it by having an exact 19 character card_num as well - still explains as a 'Seq Scan' (tablespace scan) - and each query takes up to 37 seconds (thus confirming that it is indeed doing scans and not using the index).

I've tried dropping and re-creating the indexes, still it explains as tablespace scans.

I am running postgresql 7.1.3 - a bit old, I know, but I have had no reason to upgrade just yet.

By the way, the fal_profdel table has 4,664,867 rows in it currently - thus I really don't want full table scans!

--
-----------------------------------------------------------------------------
PG.. philip@tildesoftware.com
Law of probable dispersal: Whatever it is that hits the fan will not be
evenly distributed.

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Philip Greer (#1)
Re: Why does it not use the index?

On Mon, 21 Jul 2003, Philip Greer wrote:

dumps=# \d fal_profdel
Table "fal_profdel"
Attribute | Type | Modifier
-----------+--------------------------+----------
sid | character(4) | not null
card_num | character(19) | not null
date_del | timestamp with time zone |
filename | character varying(30) |
Indices: fal_prfdel_cn,
fal_prfdel_date,
fal_prfdel_pk

dumps=# \d fal_prfdel_cn
Index "fal_prfdel_cn"
Attribute | Type
-----------+---------------
card_num | character(19)
unique btree

dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
NOTICE: QUERY PLAN:

Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12)

EXPLAIN
================================================================================

Now, why the heck is the select query not using the index? I've tried
it by having an exact 19 character card_num as well - still explains
as a 'Seq Scan' (tablespace scan) - and each query takes up to 37
seconds (thus confirming that it is indeed doing scans and not using
the index).

Have you vacuum analyzed the table recently? What does explain show if you
do set enable_seqscan=off; before the explain and then how long does the
query actually take to run with seqscan disabled.

#3Philip Greer
philip@tildesoftware.com
In reply to: Stephan Szabo (#2)
Re: Why does it not use the index?

Thanks for the response:

I took a look at the table with 'vacuum verbose analyze', here's the results:

dumps=# vacuum verbose analyze fal_profdel;
NOTICE: --Relation fal_profdel--
NOTICE: Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447, MinLen 103, MaxLen 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0. CPU 2.53s/0.58u sec.
NOTICE: Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec.
NOTICE: Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec.
NOTICE: Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec.
NOTICE: Analyzing...
VACUUM

Then - afterwards, I ran the explain again:

dumps=# explain select card_num from fal_profdel where card_num = '4828820006970';
NOTICE: QUERY PLAN:

Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1 width=12)

EXPLAIN

WTF? Why would a vacuum be necessary in order for it to start using the index?

By the way, the actual query takes subseconds to return now. MUCH better.

So - let me know why one would have to use vacuum in order for the scans to cease and index use begin. Is it a continual thing? Or does vacuum need to be done after a 'create index' in order for it to begin using the index?

Thanks for your assistance, much apprecaited!

On Mon, Jul 21, 2003 at 11:00:56AM -0700, Stephan Szabo filled up my inbox with the following:

On Mon, 21 Jul 2003, Philip Greer wrote:

dumps=# \d fal_profdel
Table "fal_profdel"
Attribute | Type | Modifier
-----------+--------------------------+----------
sid | character(4) | not null
card_num | character(19) | not null
date_del | timestamp with time zone |
filename | character varying(30) |
Indices: fal_prfdel_cn,
fal_prfdel_date,
fal_prfdel_pk

dumps=# \d fal_prfdel_cn
Index "fal_prfdel_cn"
Attribute | Type
-----------+---------------
card_num | character(19)
unique btree

dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
NOTICE: QUERY PLAN:

Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12)

EXPLAIN
================================================================================

Now, why the heck is the select query not using the index? I've tried
it by having an exact 19 character card_num as well - still explains
as a 'Seq Scan' (tablespace scan) - and each query takes up to 37
seconds (thus confirming that it is indeed doing scans and not using
the index).

Have you vacuum analyzed the table recently? What does explain show if you
do set enable_seqscan=off; before the explain and then how long does the
query actually take to run with seqscan disabled.

--
-----------------------------------------------------------------------------
PG.. philip@tildesoftware.com
Law of probable dispersal: Whatever it is that hits the fan will not be
evenly distributed.

#4Richard Huxton
dev@archonet.com
In reply to: Philip Greer (#3)
Re: Why does it not use the index?

On Monday 21 July 2003 19:51, Philip Greer wrote:

Thanks for the response:

I took a look at the table with 'vacuum verbose analyze', here's the
results:

dumps=# vacuum verbose analyze fal_profdel;

[snip]

Then - afterwards, I ran the explain again:

dumps=# explain select card_num from fal_profdel where card_num =
'4828820006970'; NOTICE: QUERY PLAN:

Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1
width=12)

[snip]

WTF? Why would a vacuum be necessary in order for it to start using the
index?

It's not the vacuum - it's the analyse. That builds up statistics on the table
in question so the planner knows how many rows there are, what the most
common values are etc. That way it can make a "best guess" as to whether
scanning the whole table or using the index will be faster.

So - let me know why one would have to use vacuum in order for the scans to
cease and index use begin. Is it a continual thing? Or does vacuum need to
be done after a 'create index' in order for it to begin using the index?

You should vacuum to reclaim "deleted" space. You should analyse to update
statistics on the table. They both tend to depend on the amount of activity
you have.

--
Richard Huxton
Archonet Ltd

#5Doug McNaught
doug@mcnaught.org
In reply to: Philip Greer (#1)
Re: Why does it not use the index?

Philip Greer <philip@tildesoftware.com> writes:

WTF? Why would a vacuum be necessary in order for it to start using the index?

It's not the VACUUM that's necessary; it's the ANALYZE. The query
planner uses table statistics to make its decisions, and ANALYZE is
what collects those statistics. Without an ANALYZE the planner will
make default assumptions that are rarely correct. :)

VACUUM should also be run regularly, of course.

-Doug

#6Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Philip Greer (#3)
Re: Why does it not use the index?

On Mon, 21 Jul 2003, Philip Greer wrote:

Thanks for the response:

I took a look at the table with 'vacuum verbose analyze', here's the results:

dumps=# vacuum verbose analyze fal_profdel;
NOTICE: --Relation fal_profdel--
NOTICE: Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup
4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447, MinLen 103, MaxLen
106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0.
CPU 2.53s/0.58u sec.
NOTICE: Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec.
NOTICE: Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec.
NOTICE: Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec.
NOTICE: Analyzing...
VACUUM

Then - afterwards, I ran the explain again:

dumps=# explain select card_num from fal_profdel where card_num = '4828820006970';
NOTICE: QUERY PLAN:

Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1 width=12)

EXPLAIN

WTF? Why would a vacuum be necessary in order for it to start using the index?

It was the analyze that was important for this probably. If you compare
the explain above with the one from before, you'll notice that before it
was estimating that around 46000 rows were going to be returned. If that
were true (and there weren't clustering effects going on) it's possible
that the sequence scan would actually have been faster than scanning the
index. Analyze gave it hopefully more reasonable data for the estimate
and so it's now guessing that 1 row is returned which is certainly better
for the index scan.

So - let me know why one would have to use vacuum in order for the
scans to cease and index use begin. Is it a continual thing? Or does
vacuum need to be done after a 'create index' in order for it to begin
using the index?

Well, if you do updates/deletes, vacuum is necessary to reclaim space, so
you should probably do it on some scheduled basis for that purpose. A
good reason to upgrade is that in 7.1 vacuum gets an exclusive lock
whereas in recent versions it doesn't by default and you can analyze
without a vacuum. Analyze is necessary to keep the statistics up to date
and should also be run on a periodic basis (daily isn't bad).

#7Mat
psql-mail@freeuk.com
In reply to: Philip Greer (#3)
Re: Why does it not use the index?

Philip,
In order for psql to work out the most effective way to run queries it
relies on statistics about the size of your tables (amounst other
things).
Running VACUUM ANALYZE does two things:
The vacuum removes any tuples you have deleted from the database (before
this i think they are just marked as deleted)
The analzye bit then rebuilds the statistics which the query planner
uses - making your queries faster.

If you are only ever adding tuples to the database then you don't need
to Vacuum as often, although i think you still need to vacuum every X
million queries.

I made the same mistake last week...

Show quoted text

On Mon, 2003-07-21 at 19:51, Philip Greer wrote:

Thanks for the response:

I took a look at the table with 'vacuum verbose analyze', here's the results:

dumps=# vacuum verbose analyze fal_profdel;
NOTICE: --Relation fal_profdel--
NOTICE: Pages 62232: Changed 0, reaped 205, Empty 0, New 0; Tup 4664867: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 447, MinLen 103, MaxLen 106; Re-using: Free/Avail. Space 14896/0; EndEmpty/Avail. Pages 0/0. CPU 2.53s/0.58u sec.
NOTICE: Index fal_prfdel_date: Pages 25318; Tuples 4664867: Deleted 0. CPU 0.90s/4.39u sec.
NOTICE: Index fal_prfdel_cn: Pages 23128; Tuples 4664867: Deleted 0. CPU 0.95s/4.63u sec.
NOTICE: Index fal_prfdel_pk: Pages 28323; Tuples 4664867: Deleted 0. CPU 1.21s/4.40u sec.
NOTICE: Analyzing...
VACUUM

Then - afterwards, I ran the explain again:

dumps=# explain select card_num from fal_profdel where card_num = '4828820006970';
NOTICE: QUERY PLAN:

Index Scan using fal_prfdel_cn on fal_profdel (cost=0.00..4.95 rows=1 width=12)

EXPLAIN

WTF? Why would a vacuum be necessary in order for it to start using the index?

By the way, the actual query takes subseconds to return now. MUCH better.

So - let me know why one would have to use vacuum in order for the scans to cease and index use begin. Is it a continual thing? Or does vacuum need to be done after a 'create index' in order for it to begin using the index?

Thanks for your assistance, much apprecaited!

On Mon, Jul 21, 2003 at 11:00:56AM -0700, Stephan Szabo filled up my inbox with the following:

On Mon, 21 Jul 2003, Philip Greer wrote:

dumps=# \d fal_profdel
Table "fal_profdel"
Attribute | Type | Modifier
-----------+--------------------------+----------
sid | character(4) | not null
card_num | character(19) | not null
date_del | timestamp with time zone |
filename | character varying(30) |
Indices: fal_prfdel_cn,
fal_prfdel_date,
fal_prfdel_pk

dumps=# \d fal_prfdel_cn
Index "fal_prfdel_cn"
Attribute | Type
-----------+---------------
card_num | character(19)
unique btree

dumps=# explain select card_num from fal_profdel where card_num = 'removed_for_privacy';
NOTICE: QUERY PLAN:

Seq Scan on fal_profdel (cost=0.00..120546.39 rows=46649 width=12)

EXPLAIN
================================================================================

Now, why the heck is the select query not using the index? I've tried
it by having an exact 19 character card_num as well - still explains
as a 'Seq Scan' (tablespace scan) - and each query takes up to 37
seconds (thus confirming that it is indeed doing scans and not using
the index).

Have you vacuum analyzed the table recently? What does explain show if you
do set enable_seqscan=off; before the explain and then how long does the
query actually take to run with seqscan disabled.

#8Andrew Ayers
aayers@eldocomp.com
In reply to: Mat (#7)
Re: Why does it not use the index?

Doug McNaught wrote:

It's not the VACUUM that's necessary; it's the ANALYZE. The query
planner uses table statistics to make its decisions, and ANALYZE is
what collects those statistics. Without an ANALYZE the planner will
make default assumptions that are rarely correct. :)

I am not the original poster, but I am a PG newbie, so:

So - are you saying that if you have a table, and you create an index on
that table, you need to perform an ANALYZE in order for PG to use the
index. Otherwise, the index goes unused (or used improperly)?

I am just trying to understand this for a database I have set up, where
I set up a variety of indexes, assumming (wrongly?) that the indexes
would be used once they were created.

Andrew L. Ayers
Phoenix, Arizona

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended addressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy, disclose or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email, and delete the message. Thank you.

#9Doug McNaught
doug@mcnaught.org
In reply to: Andrew Ayers (#8)
Re: Why does it not use the index?

Andrew Ayers <aayers@eldocomp.com> writes:

Doug McNaught wrote:

It's not the VACUUM that's necessary; it's the ANALYZE. The query
planner uses table statistics to make its decisions, and ANALYZE is
what collects those statistics. Without an ANALYZE the planner will
make default assumptions that are rarely correct. :)

I am not the original poster, but I am a PG newbie, so:

So - are you saying that if you have a table, and you create an index on
that table, you need to perform an ANALYZE in order for PG to use the
index. Otherwise, the index goes unused (or used improperly)?

Actually, the time to ANALYZE is after you make significant changes in
the table (bulk load a bunch of data, delete a bunch of rows etc). If
you create an index on a table with correct statistics, the planner
should start using it without necessarily needing ANALYZE to be run.

So run it by hand after making major data changes, and run it
periodically depending on how fast your data changes in regular use.

Clear?

-Doug

#10Philip Greer
philip@tildesoftware.com
In reply to: Mat (#7)
Re: Why does it not use the index?

Well duh. Crap. I remember that NOW that I've read everyones comments (thanks by the way!). I do remember reading that, and have placed the 'vacumedb -z' command in crontabs on databases in the past. Just forgot all about that - I guess it is because most of the databases I've set up on postgresql get to the point where they just simply run, and run, and run, and run. A nice thing about postgresql - wrap it in some automated maintenance and it becomes so stable it is nearly forgotten about! Vacuum analyze is one of those items where if you don't use it (manually) often, you lose it (and forget the reason why to use it).

Thanks again!

--
-----------------------------------------------------------------------------
PG.. philip@tildesoftware.com
Law of probable dispersal: Whatever it is that hits the fan will not be
evenly distributed.

#11nolan
nolan@celery.tssi.com
In reply to: Andrew Ayers (#8)
Re: Why does it not use the index?

So - are you saying that if you have a table, and you create an index on
that table, you need to perform an ANALYZE in order for PG to use the
index. Otherwise, the index goes unused (or used improperly)?

it is easy enough to demonstrate that creating an index will result
in immediate improvements in query times. The internals wizards would
have to answer the question as to whether 'create index' also creates
the initial stats on that index, though. (If it doesn't, maybe that
should be a configuration option.)

However, you do need the stats to take best advantage of the index over
time and a wide range of queries, so regularly scheduled 'vacuum analyze's
are desirable.

What I don't know is whether there is a way pass along hints to the
optimizer or to write a query to FORCE it to use an index on some part
of a query despite what the optimizer decides.
--
Mike Nolan