index speed and failed expectations?

Started by rihadover 17 years ago6 messagesgeneral
Jump to latest
#1rihad
rihad@mail.ru

sol=> \d stats;
Table "public.stats"
Column | Type | Modifiers
--------------+--------------------------------+-----------
id | integer | not null
start_time | timestamp(0) without time zone | not null
...
Indexes:
"stats_start_time_idx" btree (start_time)
"stats_id_key" btree (id)

There are roughly half a million rows.

This query from the console:

select * from stats order by start_time;

takes 8 seconds before starting its output. Am I wrong in assuming that
the index on start_time should make ORDER BY orders of magnitude faster?
Or is this already fast enough? Or should I max up some memory (buffer)
setting to achieve greater speeds? Not that the speed is crucial, just
curious.

TIA.

#2Adam Rich
adam.r@sbcglobal.net
In reply to: rihad (#1)
Re: index speed and failed expectations?

This query from the console:

select * from stats order by start_time;

takes 8 seconds before starting its output. Am I wrong in assuming that
the index on start_time should make ORDER BY orders of magnitude
faster?
Or is this already fast enough? Or should I max up some memory (buffer)
setting to achieve greater speeds? Not that the speed is crucial, just
curious.

Postgresql won't use the index for queries like this. Due to the
MVCC implementation, the index does not contain all necessary information
and would therefore be slower than using the table data alone.

(What postgresql lacks is a first_row/all_rows hint like oracle)

However, if you limit the number of rows enough, you might force it
to use an index:

select * from stats order by start_time limit 1000;

#3rihad
rihad@mail.ru
In reply to: Adam Rich (#2)
Re: index speed and failed expectations?

Adam Rich wrote:

This query from the console:

select * from stats order by start_time;

takes 8 seconds before starting its output. Am I wrong in assuming that
the index on start_time should make ORDER BY orders of magnitude
faster?
Or is this already fast enough? Or should I max up some memory (buffer)
setting to achieve greater speeds? Not that the speed is crucial, just
curious.

Postgresql won't use the index for queries like this. Due to the
MVCC implementation, the index does not contain all necessary information
and would therefore be slower than using the table data alone.

(What postgresql lacks is a first_row/all_rows hint like oracle)

However, if you limit the number of rows enough, you might force it
to use an index:

select * from stats order by start_time limit 1000;

Thanks! Since LIMIT/OFFSET is the typical usage pattern for a paginated
data set accessed from the Web (which is my case), it immediately
becomes a non-issue.

#4Michael Fuhr
mike@fuhr.org
In reply to: Adam Rich (#2)
Re: index speed and failed expectations?

On Mon, Aug 04, 2008 at 08:35:28AM -0500, Adam Rich wrote:

This query from the console:

select * from stats order by start_time;

takes 8 seconds before starting its output. Am I wrong in assuming that
the index on start_time should make ORDER BY orders of magnitude
faster?

Postgresql won't use the index for queries like this. Due to the
MVCC implementation, the index does not contain all necessary information
and would therefore be slower than using the table data alone.

Not necessarily true. Despite the index not having enough information,
the planner might still decide that using the index would be faster
than executing a sort.

create table stats (
id serial primary key,
start_time timestamp with time zone not null
);

insert into stats (start_time)
select now() - random() * '1 year'::interval
from generate_series(1, 100000);

create index stats_start_time_idx on stats (start_time);

analyze stats;

explain analyze select * from stats order by start_time;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using stats_start_time_idx on stats (cost=0.00..4767.83 rows=100000 width=12) (actual time=0.146..994.674 rows=100000 loops=1)
Total runtime: 1419.943 ms
(2 rows)

set enable_indexscan to off;
explain analyze select * from stats order by start_time;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Sort (cost=9845.82..10095.82 rows=100000 width=12) (actual time=3240.976..3800.038 rows=100000 loops=1)
Sort Key: start_time
-> Seq Scan on stats (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.091..500.853 rows=100000 loops=1)
Total runtime: 4226.870 ms
(4 rows)

--
Michael Fuhr

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Rich (#2)
Re: index speed and failed expectations?

"Adam Rich" <adam.r@sbcglobal.net> writes:

This query from the console:
select * from stats order by start_time;
takes 8 seconds before starting its output. Am I wrong in assuming that
the index on start_time should make ORDER BY orders of magnitude
faster?

Postgresql won't use the index for queries like this.

"won't" -> "might not". It all depends on the relative cost estimates
for indexscan vs seqscan + sort. For a large table it's quite likely
that the latter will be cheaper, because it has a better-localized
access pattern.

(What postgresql lacks is a first_row/all_rows hint like oracle)

That's spelled "LIMIT" ;-). Also, you can bias the choice in favor
of a fast-start plan if you use a cursor rather than a plain SELECT.
In that case the planner makes some allowance for the idea that
you might not want all the rows, or might be more interested in
getting the first ones quickly than minimizing the total time to
fetch all the rows.

regards, tom lane

#6Glyn Astill
glynastill@yahoo.co.uk
In reply to: rihad (#3)
Re: index speed and failed expectations?

However, if you limit the number of rows enough, you

might force it

to use an index:

select * from stats order by start_time limit 1000;

Thanks! Since LIMIT/OFFSET is the typical usage pattern for
a paginated
data set accessed from the Web (which is my case), it
immediately
becomes a non-issue.

We do a lot of queries with order by limit n, and from my experience setting enable_sort to off on the database also makes a massive difference.

http://www.postgresql.org/docs/8.3/static/indexes-ordering.html

__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html