index on large table

Started by Kacper Nowickiabout 24 years ago4 messagesgeneral
Jump to latest
#1Kacper Nowicki
kacper.nowicki@wp.pl

I have a large table, with >1 million rows, which I would like to see page
by page. It is being displayed in the oid order.

Without any index a cost for such query is huge, since it is sequential
scan of about 300MB of data.

"explain select * from events order by oid limit 10 offset 0"
NOTICE: QUERY PLAN:
Limit (cost=69.83..69.83 rows=10 width=130)
-> Sort (cost=69.83..69.83 rows=1000 width=130)
-> Seq Scan on events (cost=0.00..20.00 rows=1000 width=130)

The cost is really higher, but the important fact is, that sequential scan
is required. Do the obvious: create an index:

"create index events_oid on events(oid)"

Now the same query is instantaneous, since index is used:

"explain select * from events order by oid limit 10 offset 0"
NOTICE: QUERY PLAN:
Limit (cost=0.00..38.56 rows=10 width=130)
-> Index Scan using events_oid on events (cost=0.00..3953085.63
rows=1025245 width=130)

so, let's see further down the table, the offset is shifted to 1M, we still
want to see just 10 entries.

"explain select * from events order by oid limit 10 offset 1000000"
NOTICE: QUERY PLAN:
Limit (cost=424863.54..424863.54 rows=10 width=130)
-> Sort (cost=424863.54..424863.54 rows=1025245 width=130)
-> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130)

Bummer. This is very slow again, sequential scan again. Why the index is
not used for this query? Use of index would make it very fast!

Does anybody know what is going on?

Thanks,
Kacper

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Kacper Nowicki (#1)
Re: index on large table

On Tue, 12 Mar 2002, Kacper Nowicki wrote:

so, let's see further down the table, the offset is shifted to 1M, we still
want to see just 10 entries.

"explain select * from events order by oid limit 10 offset 1000000"
NOTICE: QUERY PLAN:
Limit (cost=424863.54..424863.54 rows=10 width=130)
-> Sort (cost=424863.54..424863.54 rows=1025245 width=130)
-> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130)

Bummer. This is very slow again, sequential scan again. Why the index is
not used for this query? Use of index would make it very fast!

What gets shown for explain with set enable_seqscan=off? If you're using
7.2, try explain analyze both ways as well.

The row grabbing with the index would be slower than the sequence scan,
but most of the cost seems to be going into the sort. Another thing to
try would be raising sort_mem I guess.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#2)
Re: index on large table

"explain select * from events order by oid limit 10 offset 1000000"
NOTICE: QUERY PLAN:
Limit (cost=424863.54..424863.54 rows=10 width=130)
-> Sort (cost=424863.54..424863.54 rows=1025245 width=130)
-> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130)

Bummer. This is very slow again, sequential scan again. Why the index is
not used for this query? Use of index would make it very fast!

Not necessarily. Using the index for this would require fetching
1000000+10 values in the indexscan (and throwing away all but 10).

The planner is counting on its fingers and guessing that the sort
is faster. It might or might not be right about that (have you
compared timings?) but certainly the index method won't be
instantaneous.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: index on large table

Tom Lane wrote:

"explain select * from events order by oid limit 10 offset 1000000"
NOTICE: QUERY PLAN:
Limit (cost=424863.54..424863.54 rows=10 width=130)
-> Sort (cost=424863.54..424863.54 rows=1025245 width=130)
-> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130)

Bummer. This is very slow again, sequential scan again. Why the index is
not used for this query? Use of index would make it very fast!

Not necessarily. Using the index for this would require fetching
1000000+10 values in the indexscan (and throwing away all but 10).

The planner is counting on its fingers and guessing that the sort
is faster. It might or might not be right about that (have you
compared timings?) but certainly the index method won't be
instantaneous.

This question is being asked a lot. I hope my new FAQ item 4.8 wording
helps, but it will take time for people to read the new version. I will
add it to 7.2.X CVS.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026