Seq Scan

Started by Tyler Durdenalmost 19 years ago10 messagesgeneral
Jump to latest
#1Tyler Durden
tylersticky@gmail.com

Hi,
I'm having some problems in performance in a simple select count(id)
from.... I have 700 000 records in one table, and when I do:

# explain select (id) from table_name;
-[ RECORD 1 ]----------------------------------------------------------------
QUERY PLAN | Seq Scan on table_name (cost=0.00..8601.30 rows=266730 width=4)

I had created an index for id(btree), but still shows "Seq Scan".
What I'm doing wrong?

Thanks,
Tyler

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Tyler Durden (#1)
Re: Seq Scan

Tyler Durden wrote:

Hi,
I'm having some problems in performance in a simple select count(id)
from.... I have 700 000 records in one table, and when I do:

# explain select (id) from table_name;
-[ RECORD 1
]----------------------------------------------------------------
QUERY PLAN | Seq Scan on table_name (cost=0.00..8601.30 rows=266730
width=4)

I had created an index for id(btree), but still shows "Seq Scan".
What I'm doing wrong?

Nothing. You have to scan the table because you aren't giving postgresql
anything to use the index by.

Joshua D. Drake

Thanks,
Tyler

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#3John D. Burger
john@mitre.org
In reply to: Tyler Durden (#1)
Re: Seq Scan

Tyler Durden wrote:

I'm having some problems in performance in a simple select count(id)
from.... I have 700 000 records in one table, and when I do:

# explain select (id) from table_name;
-[ RECORD
1 ]----------------------------------------------------------------
QUERY PLAN | Seq Scan on table_name (cost=0.00..8601.30
rows=266730 width=4)

I had created an index for id(btree), but still shows "Seq Scan".
What I'm doing wrong?

You mention SELECT COUNT(ID), but your example shows SELECT ID. In
either case, the planner is choosing the correct plan. Indexes exist
to save the engine from visiting every row in the table, but both of
these queries require every row to be visited anyway.

Perhaps you think that these queries can be satisfied without
visiting the actual table rows at all, using only the index. This is
incorrect - PG doesn't work that way.

- John Burger
MITRE

#4Tyler Durden
tylersticky@gmail.com
In reply to: John D. Burger (#3)
Re: Seq Scan

Yes, either case happens the same.
I'm come recently from MySQL and it works in a different way.
I find strange that a simple SELECT COUNT(...) is so slow with only
700 000 records.
Has been a nightmare optimizing this tables/queries.
Sorry about this silly question, but I'm new to Posgresql.

Thanks,
Tyler

Show quoted text

On 6/1/07, John D. Burger <john@mitre.org> wrote:

You mention SELECT COUNT(ID), but your example shows SELECT ID. In
either case, the planner is choosing the correct plan. Indexes exist
to save the engine from visiting every row in the table, but both of
these queries require every row to be visited anyway.

Perhaps you think that these queries can be satisfied without
visiting the actual table rows at all, using only the index. This is
incorrect - PG doesn't work that way.

- John Burger
MITRE

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#5Tyler Durden
tylersticky@gmail.com
In reply to: Joshua D. Drake (#2)
Re: Seq Scan

Ok, my bad.
But why this happens:
# explain ANALYZE select id from table_name where id>200000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using table_name_pkey on table_name (cost=0.00..2618.96
rows=68960 width=4) (actual time=220.543..1479.495 rows=66730 loops=1)
Index Cond: (id > 200000)
Total runtime: 1504.839 ms
(3 rows)

dun=# explain ANALYZE select id from table_name where id>10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on table_name (cost=0.00..9268.12 rows=266703 width=4)
(actual time=107.935..2733.592 rows=266720 loops=1)
Filter: (id > 10)
Total runtime: 2833.744 ms
(3 rows)

It uses Index Scan for id>200000 and Seq Scan for id>10?!

On 6/1/07, Joshua D. Drake <jd@commandprompt.com> wrote:

Show quoted text

Nothing. You have to scan the table because you aren't giving postgresql
anything to use the index by.

Joshua D. Drake

#6Michael Glaesemann
grzm@seespotcode.net
In reply to: Tyler Durden (#5)
Re: Seq Scan

On Jun 1, 2007, at 12:24 , Tyler Durden wrote:

On 6/1/07, Joshua D. Drake <jd@commandprompt.com> wrote:

Nothing. You have to scan the table because you aren't giving
postgresql
anything to use the index by.

# explain ANALYZE select id from table_name where id>200000;

QUERY PLAN
----------------------------------------------------------------------
----------------------------------------------------------------------
Index Scan using table_name_pkey on table_name (cost=0.00..2618.96
rows=68960 width=4) (actual time=220.543..1479.495 rows=66730 loops=1)
Index Cond: (id > 200000)
Total runtime: 1504.839 ms
(3 rows)

dun=# explain ANALYZE select id from table_name where id>10;
QUERY PLAN
----------------------------------------------------------------------
------------------------------------------------
Seq Scan on table_name (cost=0.00..9268.12 rows=266703 width=4)
(actual time=107.935..2733.592 rows=266720 loops=1)
Filter: (id > 10)
Total runtime: 2833.744 ms
(3 rows)

It uses Index Scan for id>200000 and Seq Scan for id>10?!

[Please don't top-post. It makes discussions harder to follow]

Because the planner estimates that it will be faster for it to scan
the entire table than to use the index in the latter case. Note that
only about 70,000 rows need to be visited for id > 200000, while
nearly 270,000 rows need to be visited when id > 10.

Michael Glaesemann
grzm seespotcode net

#7Reece Hart
reece@harts.net
In reply to: Tyler Durden (#5)
Re: Seq Scan

On Fri, 2007-06-01 at 18:24 +0100, Tyler Durden wrote:

It uses Index Scan for id>200000 and Seq Scan for id>10?!

Based on the statistics pg has for your table, and on the cost of using
the index, the cost based optimizer decided that it's more efficient to
seq scan all of the rows than to incur the index overhead. These
decisions are not always correct, but they're usually quite good.

If you don't believe it, try:
# set enable_seqscan=off;
# explain analyze ...
# set enable_seqscan=on;
# explain analyze ...

-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

#8Kevin Hunter
hunteke@earlham.edu
In reply to: Tyler Durden (#4)
Re: Seq Scan

of the conversation.
breaks the flow
because it
toppost
Please don't

At 1:17p -0400 on 01 Jun 2007, Tyler Durden wrote:

Yes, either case happens the same.
I'm come recently from MySQL and it works in a different way.
I find strange that a simple SELECT COUNT(...) is so slow with only
700 000 records.
Has been a nightmare optimizing this tables/queries.
Sorry about this silly question, but I'm new to Posgresql.

The much more knowledgable will correct me, but the abbr. version is
that it is for data integrity and correctness reasons and the
inherent way in which the MVCC model works. If it's any condolence,
I believe that Oracle also suffers with this particular query, but I
haven't seen any benchmarks to prove that (I think because Oracle
expressly forbids benchmarks and comparisons in their license, but
don't quote me on that).

For a complete discussion, go this page and look for the thread with
the subject 'Performance of count(*)'

http://archives.postgresql.org/pgsql-performance/2007-03/threads.php

I believe that may help.

Kevin

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Hunter (#8)
Re: Seq Scan

Kevin Hunter <hunteke@earlham.edu> writes:

At 1:17p -0400 on 01 Jun 2007, Tyler Durden wrote:

I find strange that a simple SELECT COUNT(...) is so slow with only
700 000 records.

The much more knowledgable will correct me, but the abbr. version is
that it is for data integrity and correctness reasons and the
inherent way in which the MVCC model works.

The bottom line is that a "correct" implementation (ie, one that fully
respects MVCC behavior) would create enormous overhead, as well as bad
contention bottlenecks for concurrent updates. It doesn't seem worth it.

If you want a cheap approximate answer, there are a couple of ways to
get one, but SELECT COUNT(*) is not that.

regards, tom lane

#10Tyler Durden
tylersticky@gmail.com
In reply to: Tom Lane (#9)
Re: Seq Scan

Ok, Thank all for the clarification.