Group by, count, order by and limit
My 3rd attempt to post ...
Consider this query on a large table with lots of different IDs:
SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10;
It has an index on id. Obviously, the index helps to evaluate count(id)
for a given value of id, but count()s for all the `id's should be
evaluated, so sort() will take most of the time.
Is there a way to improve performance of this query? If not, please
give some indication to do a workaround on the source itself, so perhaps
I may be able to come out with a patch.
Thanks in advance.
Anuradha
--
Debian GNU/Linux (kernel 2.4.21-pre4)
There are three ways to get something done:
(1) Do it yourself.
(2) Hire someone to do it for you.
(3) Forbid your kids to do it.
On Tuesday 18 Feb 2003 9:56 am, you wrote:
My 3rd attempt to post ...
Consider this query on a large table with lots of different IDs:
SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10;
It has an index on id. Obviously, the index helps to evaluate count(id)
for a given value of id, but count()s for all the `id's should be
evaluated, so sort() will take most of the time.
First, what does explain analyze say
Second, wild shot, how much difference does it make with different sort_mem
settings?
Shridhar
Consider this query on a large table with lots of different IDs:
SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10;
It has an index on id. Obviously, the index helps to evaluate
count(id)
for a given value of id, but count()s for all the `id's should be
evaluated, so sort() will take most of the time.Is there a way to improve performance of this query? If not, please
give some indication to do a workaround on the source itself, so
perhaps
I may be able to come out with a patch.
Is there a difference in performance if you re-write it as
SELECT id, count(id) FROM my_table GROUP BY id ORDER BY 2 LIMIT 10 ;
?
Regards, Christoph
Import Notes
Resolved by subject fallback
On Tue, Feb 18, 2003 at 10:26:46 +0600,
Anuradha Ratnaweera <anuradha@lklug.pdn.ac.lk> wrote:
My 3rd attempt to post ...
Consider this query on a large table with lots of different IDs:
SELECT id FROM my_table GROUP BY id ORDER BY count(id) LIMIT 10;
It has an index on id. Obviously, the index helps to evaluate count(id)
for a given value of id, but count()s for all the `id's should be
evaluated, so sort() will take most of the time.Is there a way to improve performance of this query? If not, please
give some indication to do a workaround on the source itself, so perhaps
I may be able to come out with a patch.
In 7.4 there is a hash method that can be used for aggregates. This
may help a lot in your case if there aren't a lot of distict IDs.
7.4 is a long way from even a beta, but you still might want to play with
it to see if it will solve your problem down the road.