BUG #4819: Ordering big tables by indexed columns is very slow.

Started by Alexeialmost 17 years ago2 messagesbugs

alexei_novakov@yahoo.com

almost 17 years ago

The following bug has been logged online:

Bug reference: 4819
Logged by: Alexei
Email address: alexei_novakov@yahoo.com
PostgreSQL version: 8.3.7
Operating system: Open SuSE 11.1 AMD Athlon 64 X2
Description: Ordering big tables by indexed columns is very slow.
Details:

Hello.

I have very simple query, which runs very long when has "order by" clause,
even though all columns in "order by" are indexed. Here is the simplified
testcase.

1) Table:
create table tmp1
(
field1 bigint not null,
field2 integer not null
)

2) Data:
I generated some test data for this table: field2 is always 2; field1 starts
from 1242865824484 and every next one is incremented by 1. I generated 3
million records.

3) Index:
create index tmp1_idx on tmp1 (field1, field2)

4) Query:
select field1, field2 from tmp1 order by 1, 2

The query plan for this query is:
Sort (cost=522779.47..530279.47 rows=3000000 width=12)
Sort Key: field1, field2
-> Seq Scan on tmp1 (cost=0.00..46217.00 rows=3000000 width=12)

Index is not used for the sorting here. But if I add "limit 1000" in the end
I get the following:
Limit (cost=0.00..75.33 rows=100 width=12)
-> Index Scan using tmp1_idx on tmp1 (cost=0.00..2259857.96 rows=3000000
width=12)

If I increase limit to 700000 index is not used again and the difference in
execution time is very noticeable:
1 millisecond for "limit 600000"; and 6 seconds for "limit 700000"

Is there anything what can be configured to make it use the index for the
ordering?

Best Regards,
Alexei Novakov.

Valentine Gogichashvili

valgog@gmail.com

almost 17 years ago

In reply to: Alexei (#1)

Re: BUG #4819: Ordering big tables by indexed columns is very slow.

On May 21, 3:04 am, alexei_nova...@yahoo.com ("Alexei") wrote:

The following bug has been logged online:

Bug reference: 4819
Logged by: Alexei
Email address: alexei_nova...@yahoo.com
PostgreSQL version: 8.3.7
Operating system: Open SuSE 11.1 AMD Athlon 64 X2
Description: Ordering big tables by indexed columns is very slow.
Details:

Hello.

I have very simple query, which runs very long when has "order by" clause,
even though all columns in "order by" are indexed. Here is the simplified
testcase.

1) Table:
create table tmp1
(
field1 bigint not null,
field2 integer not null
)

2) Data:
I generated some test data for this table: field2 is always 2; field1 starts
from 1242865824484 and every next one is incremented by 1. I generated 3
million records.

3) Index:
create index tmp1_idx on tmp1 (field1, field2)

4) Query:
select field1, field2 from tmp1 order by 1, 2

The query plan for this query is:
Sort (cost=522779.47..530279.47 rows=3000000 width=12)
Sort Key: field1, field2
-> Seq Scan on tmp1 (cost=0.00..46217.00 rows=3000000 width=12)

Index is not used for the sorting here. But if I add "limit 1000" in the end
I get the following:
Limit (cost=0.00..75.33 rows=100 width=12)
-> Index Scan using tmp1_idx on tmp1 (cost=0.00..2259857.96 rows=3000000
width=12)

If I increase limit to 700000 index is not used again and the difference in
execution time is very noticeable:
1 millisecond for "limit 600000"; and 6 seconds for "limit 700000"

Is there anything what can be configured to make it use the index for the
ordering?

Best Regards,
Alexei Novakov.

The planner cannot estimate the speed of your disks and thinks, that
doing a seqscan will be faster for so many columns, then scanning the
index, and then do a random lookup on the table.

Have a look on the documentation for the following planner
configuration parameters: effective_cache_size, seq_page_cost and
random_page_cost

With best regards,

-- Valentine Gogichashvili