The planner chooses seqscan+sort when there is an index on the sort column

Started by Csaba Nagyalmost 20 years ago15 messagesgeneral
Jump to latest
#1Csaba Nagy
nagy@ecircle-ag.com

Hi all,

I wonder why this happens:

- postgres: 8.1.3
- the table has ~200 million rows;
- there is a primary key on (col_1, col_2);
- the table was ANALYZEd;
- the planner chooses seqscan+sort for the following query even with
enable_seqscan=off:

select * from table order by col_1;

Isn't it supposed to choose the index scan at least when
enable_seqscan=off ? Even if it is indeed not faster to do the index
scan than seqscan+sort.

The actual plan looks like (names changed):

db=# set enable_seqscan = off;
SET
db=# explain select * from table order by col_1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Sort (cost=165585198.70..166102386.06 rows=206874944 width=60)
Sort Key: col_1
-> Seq Scan on table (cost=100000000.00..104552091.44
rows=206874944 width=60)
(3 rows)

db=# \d table
Table "public.table"
Column | Type |
Modifiers
-----------------+-----------------------------+----------------------------------------------------
col_1 | bigint | not null
col_2 | bigint | not null
...
Indexes:
"pk_table" PRIMARY KEY, btree (col_1, col_2)
...

Cheers,
Csaba.

#2John D. Burger
john@mitre.org
In reply to: Csaba Nagy (#1)
Re: The planner chooses seqscan+sort when there is an index on the sort column

Csaba Nagy wrote:

select * from table order by col_1;

Isn't it supposed to choose the index scan at least when
enable_seqscan=off ? Even if it is indeed not faster to do the index
scan than seqscan+sort.

I think because you've asked for every row, it's going to have to scan
the whole table anyway, to determine MVCC "liveness" of the rows
(sorry, dunno what the correct word is).

- John Burger
MITRE

#3Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Csaba Nagy (#1)
Re: The planner chooses seqscan+sort when there is an index on the sort column

Csaba Nagy <nagy@ecircle-ag.com> schrieb:

select * from table order by col_1;

Without a WHERE you get the whole table.

A Index-Scan is, in this case, expensive.

HTH, Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknow)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

#4Csaba Nagy
nagy@ecircle-ag.com
In reply to: John D. Burger (#2)
Re: The planner chooses seqscan+sort when there is an

On Wed, 2006-05-03 at 17:48, John D. Burger wrote:

Csaba Nagy wrote:

select * from table order by col_1;

Isn't it supposed to choose the index scan at least when
enable_seqscan=off ? Even if it is indeed not faster to do the index
scan than seqscan+sort.

I think because you've asked for every row, it's going to have to scan
the whole table anyway, to determine MVCC "liveness" of the rows
(sorry, dunno what the correct word is).

But I also asked for _ordered_ results, which the seq scan is not
covering, but the index does... and I specifically disabled sequential
scan. That means the planner is not even considering the primary key
index, and I would like to know why...

Actually this is a problem for me in a more complex query, which also
needs this table ordered by that column, and it results in the same plan
fragment with sequential scan + sort.

Thanks,
Csaba.

#5Csaba Nagy
nagy@ecircle-ag.com
In reply to: Andreas Kretschmer (#3)
Re: The planner chooses seqscan+sort when there is an

Without a WHERE you get the whole table.

A Index-Scan is, in this case, expensive.

I know that, and I would agree if I wouldn't have asked for _ordered_
result, and wouldn't have turned enable_seqscan=off.

In these conditions I would have expected an index scan, even if it is
more expensive than the sequential scan + sort. Actually I wanted to see
how expensive it thinks it is, but I can't get to that plan at all.

Thanks,
Csaba.

#6John D. Burger
john@mitre.org
In reply to: Csaba Nagy (#5)
Re: The planner chooses seqscan+sort when there is an

But I also asked for _ordered_ results, which the seq scan is not
covering, but the index does... and I specifically disabled sequential
scan.

Docs say:

Enables or disables the query planner's use of sequential scan plan
types. It's not possible to suppress sequential scans entirely, but
turning this variable off discourages the planner from using one if
there are other methods available.

Note the second sentence. Again, I think it will have to scan the
whole table anyway, because that's what you've asked for, and given
that, enable_seqscan=off doesn't apply.

- John Burger
MITRE

#7Csaba Nagy
nagy@ecircle-ag.com
In reply to: John D. Burger (#6)
Re: The planner chooses seqscan+sort when there is an

Docs say:

Enables or disables the query planner's use of sequential scan plan
types. It's not possible to suppress sequential scans entirely, but
turning this variable off discourages the planner from using one if
there are other methods available.

Note the second sentence. Again, I think it will have to scan the
whole table anyway, because that's what you've asked for, and given
that, enable_seqscan=off doesn't apply.

OK, maybe that's the point... the "cost bust" given to the sequential
scan by enable_seqscan=off is not enough in this case to exceed the cost
of the index scan ? The table is quite big, might be possible. I still
wonder why would be seqscan+sort faster than index scan... the sort will
for sure have to write to disk too given the size of the table...

Cheers,
Csaba.

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: Csaba Nagy (#7)
Re: The planner chooses seqscan+sort when there is an

On Wed, May 03, 2006 at 06:42:00PM +0200, Csaba Nagy wrote:

OK, maybe that's the point... the "cost bust" given to the sequential
scan by enable_seqscan=off is not enough in this case to exceed the cost
of the index scan ? The table is quite big, might be possible. I still
wonder why would be seqscan+sort faster than index scan... the sort will
for sure have to write to disk too given the size of the table...

Have you tuned the values of effective_cache_size and random_page_cost?
These have significant effects on index scans.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#9Florian Pflug
fgp@phlo.org
In reply to: Csaba Nagy (#7)
Re: The planner chooses seqscan+sort when there is an

Csaba Nagy wrote:

Docs say:

Enables or disables the query planner's use of sequential scan plan
types. It's not possible to suppress sequential scans entirely, but
turning this variable off discourages the planner from using one if
there are other methods available.

Note the second sentence. Again, I think it will have to scan the
whole table anyway, because that's what you've asked for, and given
that, enable_seqscan=off doesn't apply.

OK, maybe that's the point... the "cost bust" given to the sequential
scan by enable_seqscan=off is not enough in this case to exceed the cost
of the index scan ? The table is quite big, might be possible. I still
wonder why would be seqscan+sort faster than index scan... the sort will
for sure have to write to disk too given the size of the table...

When using an indexscan, postgres will read the actual rows in
index-order, rathen then in the order they appear on-disk.
For 200 million rows this means doing at least 200 million
disk seeks. Assuming that each seek takes just 1ms, thats
still amount to 200.000 seconds.

greetings, Florian Pflug

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Csaba Nagy (#7)
Re: The planner chooses seqscan+sort when there is an

Csaba Nagy <nagy@ecircle-ag.com> writes:

OK, maybe that's the point... the "cost bust" given to the sequential
scan by enable_seqscan=off is not enough in this case to exceed the cost
of the index scan ?

Looks that way to me. You could try setting enable_sort off as well,
which will penalize the seqscan+sort plan another 100million cost units.
And maybe try reducing random_page_cost to make the indexscan look
cheaper. However, if there's a 100million delta between the two plans,
I suspect you really really don't want the indexscan anyway ;-)

regards, tom lane

#11Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Tom Lane (#10)
Re: The planner chooses seqscan+sort when there is an

On Wed, 2006-05-03 at 13:34, Tom Lane wrote:

Csaba Nagy <nagy@ecircle-ag.com> writes:

OK, maybe that's the point... the "cost bust" given to the sequential
scan by enable_seqscan=off is not enough in this case to exceed the cost
of the index scan ?

Looks that way to me. You could try setting enable_sort off as well,
which will penalize the seqscan+sort plan another 100million cost units.
And maybe try reducing random_page_cost to make the indexscan look
cheaper. However, if there's a 100million delta between the two plans,
I suspect you really really don't want the indexscan anyway ;-)

I imagine the followup post:

So, I've had this query running for six weeks now, and...

#12A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Florian Pflug (#9)
Re: The planner chooses seqscan+sort when there is an

am 03.05.2006, um 20:20:55 +0200 mailte Florian G. Pflug folgendes:

of the index scan ? The table is quite big, might be possible. I still
wonder why would be seqscan+sort faster than index scan... the sort will
for sure have to write to disk too given the size of the table...

When using an indexscan, postgres will read the actual rows in index-order,
rathen then in the order they appear on-disk.
For 200 million rows this means doing at least 200 million
disk seeks. Assuming that each seek takes just 1ms, thats
still amount to 200.000 seconds.

Yepp, it is much cheaper to read the table seq and order later.

Andreas
--
Andreas Kretschmer (Kontakt: siehe Header)
Heynitz: 035242/47215, D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
=== Schollglas Unternehmensgruppe ===

#13Csaba Nagy
nagy@ecircle-ag.com
In reply to: Scott Marlowe (#11)
Re: The planner chooses seqscan+sort when there is an

Looks that way to me. You could try setting enable_sort off as well,
which will penalize the seqscan+sort plan another 100million cost units.
And maybe try reducing random_page_cost to make the indexscan look
cheaper. However, if there's a 100million delta between the two plans,
I suspect you really really don't want the indexscan anyway ;-)

I imagine the followup post:

So, I've had this query running for six weeks now, and...

Well, I guess that's it then... I will let the query run with the
seqscan+sort. It will still run 1-2 days, yesterday I stopped it after 6
hours ;-) Actually it would be nice to have some kind of feedback on
what is it doing so I can estimate how long it will still take... cause
I'm not sure the seqscan+sort won't run itself for 6 weeks...

Thanks,
Csaba.

#14Martijn van Oosterhout
kleptog@svana.org
In reply to: Csaba Nagy (#13)
Re: The planner chooses seqscan+sort when there is an

On Thu, May 04, 2006 at 10:09:33AM +0200, Csaba Nagy wrote:

Well, I guess that's it then... I will let the query run with the
seqscan+sort. It will still run 1-2 days, yesterday I stopped it after 6
hours ;-) Actually it would be nice to have some kind of feedback on
what is it doing so I can estimate how long it will still take... cause
I'm not sure the seqscan+sort won't run itself for 6 weeks...

There are a number of ways you can (indirectly) see how far it has got.
If it's in the first phase (the seq scan), by looking at which file it
has open you should be able to see how far along the table it is. Once
it's in the sort stage you should be able to see from the tape files
approimately how far it is. As for the number of sort stages required,
that depends on the size of the data to be sorted, but you should be
able to estimate that...

At least, you should be able to tell the difference between a 6 hour and
6 day query.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#15Csaba Nagy
nagy@ecircle-ag.com
In reply to: Martijn van Oosterhout (#14)
Re: The planner chooses seqscan+sort when there is an

There are a number of ways you can (indirectly) see how far it has got.
If it's in the first phase (the seq scan), by looking at which file it
has open you should be able to see how far along the table it is. Once
it's in the sort stage you should be able to see from the tape files
approimately how far it is. As for the number of sort stages required,
that depends on the size of the data to be sorted, but you should be
able to estimate that...

At least, you should be able to tell the difference between a 6 hour and
6 day query.

Thanks for the hints, I'll try to look up on the details...

Cheers,
Csaba.