count(*) slow on large tables
Hi,
I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a
count(*) takes around 40 seconds.
Looks like the count(*) fetches the table from disk and goes through it.
Made me wonder, why the optimizer doesn't just choose the smallest index
which in my case is around 60 Megs and goes through it, which it could
do in a fraction of the time.
Dror
--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com
Hi,
I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a
count(*) takes around 40 seconds.Looks like the count(*) fetches the table from disk and goes through it.
Made me wonder, why the optimizer doesn't just choose the smallest index
which in my case is around 60 Megs and goes through it, which it could
do in a fraction of the time.Dror
Just like other aggregate functions, count(*) won't use indexes when
counting whole table.
Regards,
Tomasz Myrta
On Thu, Oct 02, 2003 at 12:15:47 -0700,
Dror Matalon <dror@zapatec.com> wrote:
Hi,
I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a
count(*) takes around 40 seconds.Looks like the count(*) fetches the table from disk and goes through it.
Made me wonder, why the optimizer doesn't just choose the smallest index
which in my case is around 60 Megs and goes through it, which it could
do in a fraction of the time.
Because it can't tell from the index if a tuple is visible to the current
transaction and would still have to hit the table to check this. So that
performance would be a lot worse instead of better.
On Thu, Oct 02, 2003 at 12:46:45 -0700,
Dror Matalon <dror@zapatec.com> wrote:
Please keep replies copied to the list.
When would it happen that a tuple be invisible to the current
transaction? Are we talking about permissions?
They could be tuples that were changed by a transaction that hasn't committed
or in the case of serializable isolation, a transaction that committed after
the current transaction started.
Show quoted text
On Thu, Oct 02, 2003 at 02:39:05PM -0500, Bruno Wolff III wrote:
On Thu, Oct 02, 2003 at 12:15:47 -0700,
Dror Matalon <dror@zapatec.com> wrote:Hi,
I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a
count(*) takes around 40 seconds.Looks like the count(*) fetches the table from disk and goes through it.
Made me wonder, why the optimizer doesn't just choose the smallest index
which in my case is around 60 Megs and goes through it, which it could
do in a fraction of the time.Because it can't tell from the index if a tuple is visible to the current
transaction and would still have to hit the table to check this. So that
performance would be a lot worse instead of better.---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?--
Dror Matalon, President
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com
Import Notes
Reply to msg id not found: 20031002194645.GA87525@rlx11.zapatec.com
That's one of the draw back of MVCC.
I once suggested that the transaction number and other house keeping
info be included in the index, but was told to forget it...
It would solve once and for all the issue of seq_scan vs index_scan.
It would simplify the aggregate problem.
Bruno Wolff III wrote:
Show quoted text
On Thu, Oct 02, 2003 at 12:15:47 -0700,
Dror Matalon <dror@zapatec.com> wrote:Hi,
I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a
count(*) takes around 40 seconds.Looks like the count(*) fetches the table from disk and goes through it.
Made me wonder, why the optimizer doesn't just choose the smallest index
which in my case is around 60 Megs and goes through it, which it could
do in a fraction of the time.Because it can't tell from the index if a tuple is visible to the current
transaction and would still have to hit the table to check this. So that
performance would be a lot worse instead of better.---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
jllachan@nsd.ca (Jean-Luc Lachance) writes:
That's one of the draw back of MVCC.
I once suggested that the transaction number and other house keeping
info be included in the index, but was told to forget it...
It would solve once and for all the issue of seq_scan vs index_scan.
It would simplify the aggregate problem.
It would only simplify _one_ case, namely the case where someone cares
about the cardinality of a relation, and it would do that at
_considerable_ cost.
A while back I outlined how this would have to be done, and for it to
be done efficiently, it would be anything BUT simple.
It would be very hairy to implement it correctly, and all this would
cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
If you had a single WHERE clause attached, you would have to revert to
walking through the tuples looking for the ones that are live and
committed, which is true for any DBMS.
And it still begs the same question, of why the result of this query
would be particularly meaningful to anyone. I don't see the
usefulness; I don't see the value of going to the considerable effort
of "fixing" this purported problem.
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)
I don't have an opinion on how hard it would be to implement the
tracking in the indexes, but "select count(*) from some table" is, in my
experience, a query that people tend to run quite often.
One of the databases that I've used, I believe it was Informix, had that
info cached so that it always new how many rows there were in any
table. It was quite useful.
On Thu, Oct 02, 2003 at 05:57:30PM -0400, Christopher Browne wrote:
jllachan@nsd.ca (Jean-Luc Lachance) writes:
That's one of the draw back of MVCC.
I once suggested that the transaction number and other house keeping
info be included in the index, but was told to forget it...
It would solve once and for all the issue of seq_scan vs index_scan.
It would simplify the aggregate problem.It would only simplify _one_ case, namely the case where someone cares
about the cardinality of a relation, and it would do that at
_considerable_ cost.A while back I outlined how this would have to be done, and for it to
be done efficiently, it would be anything BUT simple.It would be very hairy to implement it correctly, and all this would
cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"If you had a single WHERE clause attached, you would have to revert to
walking through the tuples looking for the ones that are live and
committed, which is true for any DBMS.And it still begs the same question, of why the result of this query
would be particularly meaningful to anyone. I don't see the
usefulness; I don't see the value of going to the considerable effort
of "fixing" this purported problem.
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Dror Matalon, President
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com
The world rejoiced as dror@zapatec.com (Dror Matalon) wrote:
I don't have an opinion on how hard it would be to implement the
tracking in the indexes, but "select count(*) from some table" is, in my
experience, a query that people tend to run quite often.
One of the databases that I've used, I believe it was Informix, had that
info cached so that it always new how many rows there were in any
table. It was quite useful.
I can't imagine why the raw number of tuples in a relation would be
expected to necessarily be terribly useful.
I'm involved with managing Internet domains, and it's only when people
are being pretty clueless that anyone imagines that "select count(*)
from domains;" would be of any use to anyone. There are enough "test
domains" and "inactive domains" and other such things that the raw
number of "things in the table" aren't really of much use.
- I _do_ care how many pages a table occupies, to some extent, as that
determines whether it will fit in my disk space or not, but that's not
COUNT(*).
- I might care about auditing the exact numbers of records in order to
be assured that a data conversion process was done correctly. But in
that case, I want to do something a whole *lot* more detailed than
mere COUNT(*).
I'm playing "devil's advocate" here, to some extent. But
realistically, there is good reason to be skeptical of the merits of
using SELECT COUNT(*) FROM TABLE for much of anything.
Furthermore, the relation that you query mightn't be a physical
"table." It might be a more virtual VIEW, and if that's the case,
bets are even MORE off. If you go with the common dictum of "good
design" that users don't directly access tables, but go through VIEWs,
users may have no way to get at SELECT COUNT(*) FROM TABLE.
--
output = reverse("ac.notelrac.teneerf" "@" "454aa")
http://www.ntlug.org/~cbbrowne/finances.html
Rules of the Evil Overlord #74. "When I create a multimedia
presentation of my plan designed so that my five-year-old advisor can
easily understand the details, I will not label the disk "Project
Overlord" and leave it lying on top of my desk."
<http://www.eviloverlord.com/>
I smell a religious war in the aii:-).
Can you go several days in a row without doing select count(*) on any
of your tables?
I suspect that this is somewhat a domain specific issue. In some areas
you don't need to know the total number of rows in your tables, in
others you do.
I also suspect that you're right, that end user applications don't use
this information as often as DBAs would. On the other hand, it seems
whenever you want to optimize your app (something relevant to this list),
one of the things you do need to know is the number of rows in your
table.
Dror
On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote:
The world rejoiced as dror@zapatec.com (Dror Matalon) wrote:
I don't have an opinion on how hard it would be to implement the
tracking in the indexes, but "select count(*) from some table" is, in my
experience, a query that people tend to run quite often.
One of the databases that I've used, I believe it was Informix, had that
info cached so that it always new how many rows there were in any
table. It was quite useful.I can't imagine why the raw number of tuples in a relation would be
expected to necessarily be terribly useful.I'm involved with managing Internet domains, and it's only when people
are being pretty clueless that anyone imagines that "select count(*)
from domains;" would be of any use to anyone. There are enough "test
domains" and "inactive domains" and other such things that the raw
number of "things in the table" aren't really of much use.- I _do_ care how many pages a table occupies, to some extent, as that
determines whether it will fit in my disk space or not, but that's not
COUNT(*).- I might care about auditing the exact numbers of records in order to
be assured that a data conversion process was done correctly. But in
that case, I want to do something a whole *lot* more detailed than
mere COUNT(*).I'm playing "devil's advocate" here, to some extent. But
realistically, there is good reason to be skeptical of the merits of
using SELECT COUNT(*) FROM TABLE for much of anything.Furthermore, the relation that you query mightn't be a physical
"table." It might be a more virtual VIEW, and if that's the case,
bets are even MORE off. If you go with the common dictum of "good
design" that users don't directly access tables, but go through VIEWs,
users may have no way to get at SELECT COUNT(*) FROM TABLE.
--
output = reverse("ac.notelrac.teneerf" "@" "454aa")
http://www.ntlug.org/~cbbrowne/finances.html
Rules of the Evil Overlord #74. "When I create a multimedia
presentation of my plan designed so that my five-year-old advisor can
easily understand the details, I will not label the disk "Project
Overlord" and leave it lying on top of my desk."
<http://www.eviloverlord.com/>---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Dror Matalon, President
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com
Christopher Browne <cbbrowne@libertyrms.info> writes:
It would be very hairy to implement it correctly, and all this would
cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"If you had a single WHERE clause attached, you would have to revert to
walking through the tuples looking for the ones that are live and
committed, which is true for any DBMS.
Well it would be handy for a few other cases as well.
1 It would be useful for the case where you have a partial index with a
matching where clause. The optimizer already considers using such indexes
but it has to pay the cost of the tuple lookup, which is substantial.
2 It would be useful for the very common queries of the form
WHERE x IN (select id from foo where some_indexed_expression)
(Or the various equivalent forms including outer joins that test to see if
the matching record was found and don't retrieve any other columns in the
select list.)
3 It would be useful for many-many relationships where the intermediate table
has only the two primary key columns being joined. If you create a
multi-column index on the two columns it shouldn't need to look up the
tuple. This would be effectively be nearly equivalent to an "index organized
table".
4 It would be useful for just about all the referential integrity queries...
I don't mean to say this is definitely a good thing. The tradeoff in
complexity and time to maintain the index pages would be large. But don't
dismiss it as purely a count(*) optimization hack.
I know Oracle is capable of it and it can speed up your query a lot when you
remove that last unnecessary column from a join table allowing oracle to skip
the step of reading the table.
--
greg
Dror Matalon wrote:
I smell a religious war in the aii:-).
Can you go several days in a row without doing select count(*) on any
of your tables?I suspect that this is somewhat a domain specific issue. In some areas
you don't need to know the total number of rows in your tables, in
others you do.
If I were you, I would have an autovacuum daemon running and rather than doing
select count(*), I would look at stats generated by vacuums. They give
approximate number of tuples and it should be good enough it is accurate within
a percent.
Just another approach of achieving same thing.. Don't be religious about running
a qeury from SQL prompt. That's it..
Shridhar
Oops! dror@zapatec.com (Dror Matalon) was seen spray-painting on a wall:
I smell a religious war in the aii:-).
Can you go several days in a row without doing select count(*) on any
of your tables?
I would be more likely, personally, to run "VACUUM VERBOSE ANALYZE",
which has useful side-effects :-).
I suspect that this is somewhat a domain specific issue. In some
areas you don't need to know the total number of rows in your
tables, in others you do.
"Relationship tables," which don't contain data in their own right,
but which, instead, link together records in other tables, are likely
to have particularly useless COUNT(*)'s.
I also suspect that you're right, that end user applications don't
use this information as often as DBAs would. On the other hand, it
seems whenever you want to optimize your app (something relevant to
this list), one of the things you do need to know is the number of
rows in your table.
Ah, but in the case of optimization, there's little need for
"transactionally safe, MVCC-managed, known-to-be-exact" values.
Approximations are plenty good enough to get the right plan.
Furthermore, it's not the number of rows that is most important when
optimizing queries; the number of pages are more relevant to the
matter, as that's what the database is slinging around.
--
(reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa"))
http://www3.sympatico.ca/cbbrowne/multiplexor.html
Rules of the Evil Overlord #134. "If I am escaping in a large truck
and the hero is pursuing me in a small Italian sports car, I will not
wait for the hero to pull up along side of me and try to force him off
the road as he attempts to climb aboard. Instead I will slam on the
brakes when he's directly behind me. (A rudimentary knowledge of
physics can prove quite useful.)" <http://www.eviloverlord.com/>
On Thu, 2 Oct 2003, Christopher Browne wrote:
I can't imagine why the raw number of tuples in a relation would be
expected to necessarily be terribly useful.
We use stuff like that for reporting queries.
example:
On our message boards each post is a row. The powers that be like to know
how many posts there are total (In addition to 'today')-
select count(*) from posts is how it has been
done on our informix db. With our port to PG I instead select reltuples
pg_class.
I know when I login to a new db (or unknown to me db) the first thing I do
is look at tables and see what sort of data there is.. but in code I'd
rarely do that.
I know some monitoring things around here also do a select count(*) on
sometable to ensure it is growing, but like you said, this is easily done
with the number of pages as well.
yes. Informix caches this data. I believe Oracle does too.
Mysql with InnoDB does the same thing PG does. (MyISAM caches it)
--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/
Well I can think of many more case where it would be usefull:
SELECT COUNT(DISTINCT x) FROM ...
SELECT COUNT(*) FROM ... WHERE x = ?
Also having transaction number (visibility) would tip the balance more
toward index_scan than seq_scan because you do not have to look up
visibility in the data file. We all know this has been an issue many
times.
Having a different index file structure when the index is not UNIQUE
would help too.
The last page of a non unique index could hold more stats.
Christopher Browne wrote:
Show quoted text
jllachan@nsd.ca (Jean-Luc Lachance) writes:
That's one of the draw back of MVCC.
I once suggested that the transaction number and other house keeping
info be included in the index, but was told to forget it...
It would solve once and for all the issue of seq_scan vs index_scan.
It would simplify the aggregate problem.It would only simplify _one_ case, namely the case where someone cares
about the cardinality of a relation, and it would do that at
_considerable_ cost.A while back I outlined how this would have to be done, and for it to
be done efficiently, it would be anything BUT simple.It would be very hairy to implement it correctly, and all this would
cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"If you had a single WHERE clause attached, you would have to revert to
walking through the tuples looking for the ones that are live and
committed, which is true for any DBMS.And it still begs the same question, of why the result of this query
would be particularly meaningful to anyone. I don't see the
usefulness; I don't see the value of going to the considerable effort
of "fixing" this purported problem.
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote:
Well I can think of many more case where it would be usefull:
SELECT COUNT(DISTINCT x) FROM ...
SELECT COUNT(*) FROM ... WHERE x = ?
Those are precisely the cases that the "other databases" ALSO fall
down on.
Maintaining those sorts of statistics would lead [in _ANY_ database;
PostgreSQL has no disadvantage in this] to needing for each and every
update to update a whole host of statistic values.
It would be fairly reasonable to have a trigger, in PostgreSQL, to
manage this sort of information. It would not be outrageously
difficult to substantially improve performance of queries, at the
considerable cost that each and every update would have to update a
statistics table.
If you're doing a whole lot of these sorts of queries, then it is a
reasonable idea to create appropriate triggers for the (probably very
few) tables where you are doing these counts.
But the notion that this should automatically be applied to all tables
always is a dangerous one. It would make update performance Suck
Badly, because the extra statistical updates would be quite expensive.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','cbbrowne.com').
http://www3.sympatico.ca/cbbrowne/multiplexor.html
I'm sorry Dave, I can't let you do that.
Why don't you lie down and take a stress pill?
Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
jllachan@nsd.ca (Jean-Luc Lachance) writes:
That's one of the draw back of MVCC.
I once suggested that the transaction number and other house keeping
info be included in the index, but was told to forget it...
It would solve once and for all the issue of seq_scan vs index_scan.
It would simplify the aggregate problem.It would only simplify _one_ case, namely the case where someone cares
about the cardinality of a relation, and it would do that at
_considerable_ cost.A while back I outlined how this would have to be done, and for it to
be done efficiently, it would be anything BUT simple.
Could this be made a TODO item, perhaps with your attack plan.
Of course as strictly optional feature useful only for special situations
(see below)
I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.
It would be very hairy to implement it correctly, and all this would
cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes.
If you had a single WHERE clause attached, you would have to revert to
walking through the tuples looking for the ones that are live and
committed, which is true for any DBMS.
If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for "walking through the
tuples" in data relation.
the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.
And it still begs the same question, of why the result of this query
would be particularly meaningful to anyone. I don't see the
usefulness; I don't see the value of going to the considerable effort
of "fixing" this purported problem.
Being able to do fast count(*) is just a side benefit.
----------------
Hannu
On our message boards each post is a row. The powers that be like to know
how many posts there are total (In addition to 'today')-
select count(*) from posts is how it has been
done on our informix db. With our port to PG I instead select reltuples
pg_class.
We have exactly the same situation, except we just added a 'num_replies'
field to each thread and a 'num_posts' field to each forum, so that
getting that information out is a very fast operation. Because, of
course, there are hundreds of times more reads of that information than
writes...
Chris
Christopher Browne wrote:
jllachan@nsd.ca (Jean-Luc Lachance) writes:
That's one of the draw back of MVCC.
I once suggested that the transaction number and other house keeping
info be included in the index, but was told to forget it...
It would solve once and for all the issue of seq_scan vs index_scan.
It would simplify the aggregate problem.It would only simplify _one_ case, namely the case where someone cares
about the cardinality of a relation, and it would do that at
_considerable_ cost.A while back I outlined how this would have to be done, and for it to
be done efficiently, it would be anything BUT simple.It would be very hairy to implement it correctly, and all this would
cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
We do have a TODO item:
* Consider using MVCC to cache count(*) queries with no WHERE clause
The idea is to cache a recent count of the table, then have
insert/delete add +/- records to the count. A COUNT(*) would get the
main cached record plus any visible +/- records. This would allow the
count to return the proper value depending on the visibility of the
requesting transaction, and it would require _no_ heap or index scan.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Hannu Krosing <hannu@tm.ee> writes:
Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
A while back I outlined how this would have to be done, and for it to
be done efficiently, it would be anything BUT simple.
Could this be made a TODO item, perhaps with your attack plan.
If I recall that discussion correctly, no one including Christopher
thought the attack plan was actually reasonable.
What this keeps coming down to is that an optimization that helps only
COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
development and maintenance effort to justify its existence.
At least if you insist on an exact, MVCC-correct answer. So far as I've
seen, the actual use cases for unqualified COUNT(*) could be handled
equally well by an approximate answer. What we should be doing rather
than wasting large amounts of time trying to devise exact solutions is
telling people to look at pg_class.reltuples for approximate answers.
We could also be looking at beefing up support for that approach ---
maybe provide some syntactic sugar for the lookup, maybe see if we can
update reltuples in more places than we do now, make sure that the
autovacuum daemon includes "keep reltuples accurate" as one of its
design goals, etc etc.
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
We do have a TODO item:
* Consider using MVCC to cache count(*) queries with no WHERE clause
The idea is to cache a recent count of the table, then have
insert/delete add +/- records to the count. A COUNT(*) would get the
main cached record plus any visible +/- records. This would allow the
count to return the proper value depending on the visibility of the
requesting transaction, and it would require _no_ heap or index scan.
... and it would give the wrong answers. Unless the cache is somehow
snapshot-aware, so that it can know which other transactions should be
included in your count.
regards, tom lane