constant time count(*) ?

Started by Mark Harrisonover 22 years ago5 messagesgeneral
Jump to latest
#1Mark Harrison
mh@pixar.com

We're looking into moving some data from mysql to postgresql, and
notice that count(*) does not seem to be a constant-time function
as it seems to be in mysql.

planb=# explain select count(*) from assets;
QUERY PLAN
----------------------------------------------------------------
Aggregate (cost=22.50..22.50 rows=1 width=0)
-> Seq Scan on assets (cost=0.00..20.00 rows=1000 width=0)
(2 rows)

Is there a way to optimize count(*) such that it does not have
to do a sequential scan? We use this on some big tables and it
is slowing down processing quite a lot.

Thanks!
Mark

--
Mark Harrison
Pixar Animation Studios

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Mark Harrison (#1)
Re: constant time count(*) ?

Mark Harrison writes:

Is there a way to optimize count(*) such that it does not have
to do a sequential scan?

No. If you need to count a lot, you need to store the information
separately.

--
Peter Eisentraut peter_e@gmx.net

#3Dann Corbit
DCorbit@connx.com
In reply to: Peter Eisentraut (#2)
Re: constant time count(*) ?

This should definitely be a FAQ.

The semantics of MVCC (multi-version concurrency control) means that you
can't just store a number somewhere in the header of the table like some
other database systems do.

Try a count(*) on Oracle and you will see similar behavior. They use
MVCC also.

Show quoted text

-----Original Message-----
From: Mark Harrison [mailto:mh@pixar.com]
Sent: Wednesday, October 15, 2003 11:00 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] constant time count(*) ?

We're looking into moving some data from mysql to postgresql,
and notice that count(*) does not seem to be a constant-time
function as it seems to be in mysql.

planb=# explain select count(*) from assets;
QUERY PLAN
----------------------------------------------------------------
Aggregate (cost=22.50..22.50 rows=1 width=0)
-> Seq Scan on assets (cost=0.00..20.00 rows=1000
width=0) (2 rows)

Is there a way to optimize count(*) such that it does not
have to do a sequential scan? We use this on some big tables
and it is slowing down processing quite a lot.

Thanks!
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)

#4Andrew Sullivan
andrew@libertyrms.info
In reply to: Mark Harrison (#1)
Re: constant time count(*) ?

On Wed, Oct 15, 2003 at 11:00:10AM -0700, Mark Harrison wrote:

Is there a way to optimize count(*) such that it does not have
to do a sequential scan? We use this on some big tables and it
is slowing down processing quite a lot.

No. There's a busload of discussion on this topic in the archives.
If you need an approximate value, you can get it from the system
tables.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110
#5Tino Wildenhain
tino@wildenhain.de
In reply to: Mark Harrison (#1)
Re: constant time count(*) ?

Hi Mark,

Mark Harrison wrote:

We're looking into moving some data from mysql to postgresql, and
notice that count(*) does not seem to be a constant-time function
as it seems to be in mysql.

planb=# explain select count(*) from assets;
QUERY PLAN
----------------------------------------------------------------
Aggregate (cost=22.50..22.50 rows=1 width=0)
-> Seq Scan on assets (cost=0.00..20.00 rows=1000 width=0)
(2 rows)

Is there a way to optimize count(*) such that it does not have
to do a sequential scan? We use this on some big tables and it
is slowing down processing quite a lot.

How do you need an unqualified
select count(*) on a table so often
it is making a problem?

Regards
Tino