BUG #14011: select count(distinct column) does not utilizes indices to improve performance

Started by Glen Takahashiabout 10 years ago2 messagesbugs
Jump to latest
#1Glen Takahashi
gtakahashi@palantir.com

The following bug has been logged on the website:

Bug reference: 14011
Logged by: Glen Takahashi
Email address: gtakahashi@palantir.com
PostgreSQL version: 9.3.6
Operating system: Red Hat Enterprise Linux 6.7
Description:

Doing a select count(distinct column) does not utilize indices and takes a
very long time to compute.
For example

Create table test_table (number integer);
Create index on test_table (number);
Insert into test_table (number) select generate_series(1,100000); (I did
this ~500 times to replicate the scale of data we have)
Select count(distinct number) from test_table;
Count 100000 Time: 63419.600ms
Select count(1) from (select distinct number from test_table) a;
Count 100000 Time: 10743.186 ms

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Glen Takahashi (#1)
Re: BUG #14011: select count(distinct column) does not utilizes indices to improve performance

gtakahashi@palantir.com writes:

Doing a select count(distinct column) does not utilize indices

Nope, it does not. Don't hold your breath waiting for that to change;
it might sometime, but nobody is working on it that I know of.

You'd have better luck with a "SELECT count(*) GROUP BY column"
type of query, I expect. (Most likely that won't use an index either,
BTW, but it'll be smarter than count(distinct).)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs