question on scan of clustered index

Started by Nonamealmost 19 years ago3 messagesgeneral

pwing@student.umass.edu

almost 19 years ago

Hello:

I am running the following query:
SELECT COUNT(*) FROM orders WHERE o_orderdate < date('1995-03-15');

Here are some stats for the orders relation:
select relname, relpages, reltuples from pg_class where relname = 'orders';
"orders";29278;1.49935e+06

For my query above, the reduction factor is about 50%; 700,000 out of 1.5
millions tuples satisfy the WHERE clause.

I have a clustered index defined as follows:
CREATE INDEX orders_orderdate_idx
ON orders
USING btree
(o_orderdate);
ALTER TABLE orders CLUSTER ON orders_orderdate_idx;

I am running three ways: sequential scan, bitmap index scan and index scan.
The I/O cost for the index scan is 24+ times more than the other two. I do not
understand why this happens. If I am using a clustered index, it is my
understanding that there should be no need to retrieve a single page multiple
times, as tuples are sorted. Am I misunderstanding something?

Paula

Here are the results of explain analyze, and I/O results from
pg_statio_user_tables:

Aggregate (cost=1470905.79..1470905.80 rows=1 width=0) (actual
time=9040.320..9040.321 rows=1 loops=1)
-> Index Scan using orders_orderdate_idx on orders (cost=0.00..1469101.03
rows=721902 width=0) (actual time=0.098..8222.234 rows=727305 loops=1)
Index Cond: (o_orderdate < '1995-03-15'::date)
Total runtime: 9040.375 ms

I/O cost:
Heap Blocks Read: 649966 (from disk)
Heap Blocks Hit: 70070 (from buffer)
Index Blocks Read: 1591
__________________________________

Aggregate (cost=52033.65..52033.66 rows=1 width=0) (actual
time=2364.470..2364.471 rows=1 loops=1)
-> Bitmap Heap Scan on orders (cost=11927.12..50228.90 rows=721902 width=0)
(actual time=338.547..1609.118 rows=727305 loops=1)
Recheck Cond: (o_orderdate < '1995-03-15'::date)
-> Bitmap Index Scan on orders_orderdate_idx (cost=0.00..11746.65
rows=721902 width=0) (actual time=329.249..329.249 rows=727305 loops=1)
Index Cond: (o_orderdate < '1995-03-15'::date)
Total runtime: 2364.697 ms

I/O cost:
Heap Blocks Read: 29278
Index Blocks Read: 1591
__________________________________

Aggregate (cost=49832.76..49832.77 rows=1 width=0) (actual
time=2215.752..2215.753 rows=1 loops=1)
-> Seq Scan on orders (cost=0.00..48028.00 rows=721902 width=0) (actual
time=0.042..1458.734 rows=727305 loops=1)
Filter: (o_orderdate < '1995-03-15'::date)
Total runtime: 2215.801 ms

I/O cost:
Heap Blocks Read: 29278

Stephen Frost

sfrost@snowman.net

almost 19 years ago

In reply to: Noname (#1)

Re: question on scan of clustered index

* pwing@student.umass.edu (pwing@student.umass.edu) wrote:

I am running three ways: sequential scan, bitmap index scan and index scan.
The I/O cost for the index scan is 24+ times more than the other two. I do not
understand why this happens. If I am using a clustered index, it is my
understanding that there should be no need to retrieve a single page multiple
times, as tuples are sorted. Am I misunderstanding something?

That does seem kind of bad (24+ times is quite a bit). At the same time
though, you are having to go through the index when you're doing an
index scan whereas you don't with the seq scan, so you're certainly
pulling in more data of the disk.

I'm a bit mystified why you'd think an index scan to pull half the data
from a table is going to be faster than a seq scan anyway though... If
you're trying to speed up the query to pull half the records you might
want to look into partitioned tables instead, though I doubt it'll get
much faster...

Thanks,

Stephen

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Noname (#1)

Re: question on scan of clustered index

pwing@student.umass.edu writes:

I am running three ways: sequential scan, bitmap index scan and index scan.
The I/O cost for the index scan is 24+ times more than the other two. I do not
understand why this happens. If I am using a clustered index, it is my
understanding that there should be no need to retrieve a single page multiple
times, as tuples are sorted. Am I misunderstanding something?

How long ago did you cluster the table? It sounds like it has degraded
far away from a fully-ordered state.

regards, tom lane