table full scan or index full scan?

Started by peixubinover 16 years ago4 messagesgeneral
Jump to latest
#1peixubin
peixubin@yahoo.com.cn

I have a 30,000,000 records table, counts the record number to need for 40 seconds.
The table has a primary key on column id;

perf=# explain select count(*) from test;
...
-----------------------------------------
Aggregate (cost=603702.80..603702.81 rows=1 width=0)
  -> Seq scan on test (cost=0.00..527681.04 rows=30408704 width=0)
...
perf=# select count(*) from test;
count
------------
30408704

perf=#

The
postgresql database uses the table full scan.but in oracle, the similar
SQL uses the index full scanning,speed quickly many than postgresql.  

postgresql's optimizer whether to have the necessity to make the adjustment?
 

postgresql version:8.3.7

OS : ubuntu 9

kernel:2.6.28-15-generic x86_64

___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/

#2Peter Hunsberger
peter.hunsberger@gmail.com
In reply to: peixubin (#1)
Re: table full scan or index full scan?

2009/10/11 Scott Marlowe <scott.marlowe@gmail.com>:

The postgresql database uses the table full scan.but in oracle, the similar SQL uses the index full scanning,speed quickly many than postgresql.

Yep, PostgreSQL isn't Oracle.  It's a trade off.  In pgsql indexes
don't contain visibility info, so all index lookups have to eventually
hit the table itself.  So you either do indexlookup -> table lookup,
repeat as many times as you have index lookups or you just hit the
table since you gotta go there anyway.

On the  bright side, this makes updates faster since you don't have to
lock both table and index and write to both at the same time anymore.

postgresql's optimizer whether to have the necessity to make the adjustment?

Sorry, it's an architectural difference.  Are you testing in a
realistic scenario including both reads and writes to the database to
see if postgresql is faster overall and identify problem areas that
pop up there?

This is interesting, I just ran a similar issue the other day.
Clearly there is a wide range of read / write scenarios that Postgres
should be able to cover. These days, I have a lot of designs leaning
more toward the data warehouse side of the operational spectrum as
opposed to the high transaction scenario and I specifically design DB
management strategies with the knowledge that writes will happen far
less than reads in our applications. Is this an area where
optimizations are considered hard in Postrgres or hopefully, just
something that is on the todo list but just no one has gotten around
to yet? Similarly, are accurate table summary stats possible someday
or are they considered close to impossible in order to eliminate race
conditions and lock contention scenarios?

--
Peter Hunsberger

#3Martijn van Oosterhout
kleptog@svana.org
In reply to: Peter Hunsberger (#2)
Re: table full scan or index full scan?

On Sun, Oct 11, 2009 at 10:01:52PM -0500, Peter Hunsberger wrote:

This is interesting, I just ran a similar issue the other day.
Clearly there is a wide range of read / write scenarios that Postgres
should be able to cover. These days, I have a lot of designs leaning
more toward the data warehouse side of the operational spectrum as
opposed to the high transaction scenario and I specifically design DB
management strategies with the knowledge that writes will happen far
less than reads in our applications. Is this an area where
optimizations are considered hard in Postrgres or hopefully, just
something that is on the todo list but just no one has gotten around
to yet?

We consider any optimisation that is feasible. Unfortunatly, "the
number of rows in a table" is a fairly hard number to get in the
general case because it depends on who is asking (different
transactions may get different answers).

Similarly, are accurate table summary stats possible someday
or are they considered close to impossible in order to eliminate race
conditions and lock contention scenarios?

It is possible, it's just not cheap in the general case. The usual
approach is to keep a table that tracks the number of rows. By using
deltas you can make it lockfree. These are however costs most
applications do't need. If you know in your case that the data never
changes, just cache the result somewhere. That will be infinitly more
efficient than any other method.

If you are happy with estimates, they are there and are kept reasonably
uptodate.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#4peixubin
peixubin@yahoo.com.cn
In reply to: Martijn van Oosterhout (#3)
Re: table full scan or index full scan?
 I understood , thanks.
--- 09年10月12日,周一, Martijn van Oosterhout <kleptog@svana.org> 写道:

发件人: Martijn van Oosterhout <kleptog@svana.org>
主题: Re: [GENERAL] table full scan or index full scan?
收件人: "Peter Hunsberger" <peter.hunsberger@gmail.com>
抄送: "Scott Marlowe" <scott.marlowe@gmail.com>, "???? ??" <peixubin@yahoo.com.cn>, pgsql-general@postgresql.org
日期: 2009年10月12日,周一,下午2:25

On Sun, Oct 11, 2009 at 10:01:52PM -0500, Peter Hunsberger wrote:

This is interesting, I just ran a similar issue the other day.
Clearly there is a wide range of read / write scenarios that Postgres
should be able to cover.  These days, I have a lot of designs leaning
more toward the data warehouse side of the operational spectrum as
opposed to the high transaction scenario and I specifically design DB
management strategies with the knowledge that writes will happen far
less than reads in our applications.  Is this an area where
optimizations are considered hard in Postrgres or hopefully, just
something that is on the todo list but just no one has gotten around
to yet? 

We consider any optimisation that is feasible. Unfortunatly, "the
number of rows in a table" is a fairly hard number to get in the
general case because it depends on who is asking (different
transactions may get different answers).

Similarly, are accurate table summary stats possible someday
or are they considered close to impossible in order to eliminate race
conditions and lock contention scenarios?

It is possible, it's just not cheap in the general case. The usual
approach is to keep a table that tracks the number of rows. By using
deltas you can make it lockfree. These are however costs most
applications do't need. If you know in your case that the data never
changes, just cache the result somewhere. That will be infinitly more
efficient than any other method.

If you are happy with estimates, they are there and are kept reasonably
uptodate.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/