Counting unique rows as an aggregate.

Started by r_mustaover 17 years ago9 messagesgeneral

Jump to latest

r_musta

zepolen@gmail.com

over 17 years ago

My current solution is to issue a bunch of queries:

SELECT make, count(*) FROM table WHERE >criteria< GROUP BY make ORDER BY count(*) DESC LIMIT 3;

make count
----------+---------
audi | 50
bmw | 40
vw | 30

SELECT color, count(*) FROM table WHERE >criteria< GROUP BY color
ORDER BY count(*) DESC LIMIT 3;

color count
-----------+------
red | 400
blue | 200
green | 100

Which will give me the top 3 counts of each column im interested in
for the >criteria< specified.

However, this is starting to become too slow (as there are about 10 of
these queries), and therefore I need to write an aggregate function
which lets me do:

SELECT count_unique(make), count_unique(color) from table WHERE >criteria<;

After reading about aggregate functions, this should be possible, as
long as I can use a dictionary/hashmap type for the state<STYPE>
argument.

Is there such a type in postgresql that can be used in an aggregate
function, and if there isn't, how would it be possible to make one?

Also, do you think I'm going about this the wrong way, and there is a
much better solution that's I've neglected?

Scott Marlowe

scott.marlowe@gmail.com

over 17 years ago

In reply to: r_musta (#1)

Re: Counting unique rows as an aggregate.

On Mon, Sep 29, 2008 at 12:12 PM, r_musta <zepolen@gmail.com> wrote:

However, this is starting to become too slow (as there are about 10 of
these queries), and therefore I need to write an aggregate function
which lets me do:

SELECT count_unique(make), count_unique(color) from table WHERE >criteria<;

After reading about aggregate functions, this should be possible, as
long as I can use a dictionary/hashmap type for the state<STYPE>
argument.

This might be a nice fit for materialized views. While they're not
built in, PostgreSQL's extensibility allows you to build them prettily
easily.

http://jonathangardner.net/tech/w/PostgreSQL/Materialized_Views

Jonathan Gardner's web page on it is fantastic.

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: r_musta (#1)

Re: Counting unique rows as an aggregate.

r_musta <zepolen@gmail.com> writes:

However, this is starting to become too slow (as there are about 10 of
these queries), and therefore I need to write an aggregate function
which lets me do:

SELECT count_unique(make), count_unique(color) from table WHERE >criteria<;

I must be missing something, because I don't see why you couldn't do

SELECT count(distinct make), count(distinct color) from table WHERE >criteria<;

regards, tom lane

Richard Broersma

richard.broersma@gmail.com

over 17 years ago

In reply to: Tom Lane (#3)

Re: Counting unique rows as an aggregate.

On Mon, Sep 29, 2008 at 4:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

SELECT count(distinct make), count(distinct color) from table WHERE >criteria<;

Is this in the SQL spec? I didn't know Agg functions could do this?

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Klint Gore

kgore4@une.edu.au

over 17 years ago

In reply to: Richard Broersma (#4)

Re: Counting unique rows as an aggregate.

Richard Broersma wrote:

On Mon, Sep 29, 2008 at 4:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

SELECT count(distinct make), count(distinct color) from table WHERE >criteria<;

Is this in the SQL spec? I didn't know Agg functions could do this?

Yes. SQL92 6.5

<set function specification> ::=
COUNT <left paren> <asterisk> <right paren>
| <general set function>

<set function type> ::=
AVG | MAX | MIN | SUM | COUNT

<set quantifier> ::= DISTINCT | ALL

I never realised that you could use it for more than count though. In
8.3.3, it worked for sum/avg/stddev/variance.

klint.

--
Klint Gore
Database Manager
Sheep CRC
A.G.B.U.
University of New England
Armidale NSW 2350

Ph: 02 6773 3789
Fax: 02 6773 3266
EMail: kgore4@une.edu.au

r_musta

zepolen@gmail.com

over 17 years ago

In reply to: r_musta (#1)

Re: Counting unique rows as an aggregate.

On Sep 30, 2:36 am, t...@sss.pgh.pa.us (Tom Lane) wrote:

SELECT count_unique(make), count_unique(color) from table WHERE >criteria<;

I must be missing something, because I don't see why you couldn't do
SELECT count(distinct make), count(distinct color) from table WHERE >criteria<;

I didn't explain well, I want the count of each distinct value in a
column, eg, if the color column has 50 rows, 20x'red', 10x'green',
20x'blue' - it will give me those results.

SELECT count(distinct color) would return 3 - which is the count of
distinct values, which is not what I want.

r_musta

zepolen@gmail.com

over 17 years ago

In reply to: r_musta (#1)

Re: Counting unique rows as an aggregate.

On Sep 29, 11:25 pm, scott.marl...@gmail.com ("Scott Marlowe") wrote:

However, this is starting to become too slow (as there are about 10 of
these queries), and therefore I need to write an aggregate function
which lets me do:

SELECT count_unique(make), count_unique(color) from table WHERE >criteria<;

After reading about aggregate functions, this should be possible, as
long as I can use a dictionary/hashmap type for the state<STYPE>
argument.

This might be a nice fit for materialized views. While they're not
built in, PostgreSQL's extensibility allows you to build them prettily
easily.

http://jonathangardner.net/tech/w/PostgreSQL/Materialized_Views

Materialized views won't work as the criteria used to narrow the
search is very arbritrary and rarely repeated, and as the count's
depend on the current result set, they would offer a miniscule
increase in speed, only as the row width would be a little smaller.

Jan Otto

asche@me.com

over 17 years ago

In reply to: r_musta (#6)

Re: Counting unique rows as an aggregate.

I must be missing something, because I don't see why you couldn't do
SELECT count(distinct make), count(distinct color) from table WHERE

criteria<;

I didn't explain well, I want the count of each distinct value in a
column, eg, if the color column has 50 rows, 20x'red', 10x'green',
20x'blue' - it will give me those results.

SELECT count(distinct color) would return 3 - which is the count of
distinct values, which is not what I want.

maybe this simple example helps you:

# create table colors (color text);
CREATE TABLE
# insert into colors select 'red';
INSERT 0 1
# insert into colors select 'red';
INSERT 0 1
# insert into colors select 'green';
INSERT 0 1
# insert into colors select 'yellow';
INSERT 0 1
# insert into colors select 'yellow';
INSERT 0 1
# insert into colors select 'yellow';
INSERT 0 1
# select count(1), color from colors group by color;
count | color
-------+--------
2 | red
3 | yellow
1 | green
(3 rows)

Jan

Lennin Caro

lennin.caro@yahoo.com

over 17 years ago

In reply to: r_musta (#6)

Re: Counting unique rows as an aggregate.

--- On Tue, 9/30/08, r_musta <zepolen@gmail.com> wrote:

From: r_musta <zepolen@gmail.com>
Subject: Re: [GENERAL] Counting unique rows as an aggregate.
To: pgsql-general@postgresql.org
Date: Tuesday, September 30, 2008, 6:55 AM
On Sep 30, 2:36 am, t...@sss.pgh.pa.us (Tom Lane) wrote:

SELECT count_unique(make), count_unique(color)

from table WHERE >criteria<;

I must be missing something, because I don't see

why you couldn't do

SELECT count(distinct make), count(distinct color)

from table WHERE >criteria<;

I didn't explain well, I want the count of each
distinct value in a
column, eg, if the color column has 50 rows,
20x'red', 10x'green',
20x'blue' - it will give me those results.

SELECT count(distinct color) would return 3 - which is the
count of
distinct values, which is not what I want.

SELECT count(color),color from table group by color