count (DISTINCT expression [ , ... ] ) and documentation

Started by Ivan Sergio Borgonovoover 17 years ago13 messagesgeneral
Jump to latest
#1Ivan Sergio Borgonovo
mail@webthatworks.it

I noticed that starting from 8.2 the documentation at
http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
say that multiple distinct expressions are supported

aggregate_name (DISTINCT expression [, expression] )

While previous docs just listed one:

aggregate_name (DISTINCT expression)

Still I'm using 8.3 and

select count(distinct c1, c2) from table1;

report:

No function matches the given name and argument types. You might
need to add explicit type casts.

What should I write in spite of?

select count(distinct c1, c2) from table1;

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Ivan Sergio Borgonovo (#1)
Re: count (DISTINCT expression [ , ... ] ) and documentation

Hello

count has only one argument,

try:

postgres=# select * from fooa;
a | b
----+----
10 | 20
(1 row)

postgres=# select count(distinct a,b) from fooa;
ERROR: function count(integer, integer) does not exist
LINE 1: select count(distinct a,b) from fooa;
^
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.
postgres=# select count(distinct (a,b)) from fooa;
count
-------
1
(1 row)

regards
Pavel Stehule

2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:

Show quoted text

I noticed that starting from 8.2 the documentation at
http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
say that multiple distinct expressions are supported

aggregate_name (DISTINCT expression [, expression] )

While previous docs just listed one:

aggregate_name (DISTINCT expression)

Still I'm using 8.3 and

select count(distinct c1, c2) from table1;

report:

No function matches the given name and argument types. You might
need to add explicit type casts.

What should I write in spite of?

select count(distinct c1, c2) from table1;

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Pavel Stehule (#2)
Re: count (DISTINCT expression [ , ... ] ) and documentation

On Fri, 26 Dec 2008 15:46:48 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

count has only one argument,

then what was changed between 8.1 and 8.2 to change the docs?
None of the functions listed in:
http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
seems to support
aggregate(distinct exp [,exp])

Does the change reflect the change in the possibility to write user
defined aggregates that support more then one distinct expression?

The first thing that comes to my mind to emulate
count(distinct a,b)
would be to

create table test.dist (a int, b int);
insert into test.dist values(1,0);
insert into test.dist values(1,0);
insert into test.dist values(1,1);
insert into test.dist values(0,0);
select count(*) from (select distinct a,b from test.dist ) a;

but still I can't think of anything that would work with
aggregate(distinct a,b)
not just count.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Ivan Sergio Borgonovo (#3)
Re: count (DISTINCT expression [ , ... ] ) and documentation

2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:

On Fri, 26 Dec 2008 15:46:48 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

count has only one argument,

then what was changed between 8.1 and 8.2 to change the docs?
None of the functions listed in:
http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
seems to support
aggregate(distinct exp [,exp])

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE

Does the change reflect the change in the possibility to write user
defined aggregates that support more then one distinct expression?

CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
SFUNC = sfunc,
STYPE = state_data_type
[ , FINALFUNC = ffunc ]
[ , INITCOND = initial_condition ]
[ , SORTOP = sort_operator ]
)

you are able to write multi param aggregates.

regards
Pavel Stehule

Show quoted text

The first thing that comes to my mind to emulate
count(distinct a,b)
would be to

create table test.dist (a int, b int);
insert into test.dist values(1,0);
insert into test.dist values(1,0);
insert into test.dist values(1,1);
insert into test.dist values(0,0);
select count(*) from (select distinct a,b from test.dist ) a;

but still I can't think of anything that would work with
aggregate(distinct a,b)
not just count.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Pavel Stehule (#4)
Re: count (DISTINCT expression [ , ... ] ) and documentation

On Fri, 26 Dec 2008 16:23:52 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:

On Fri, 26 Dec 2008 15:46:48 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

count has only one argument,

then what was changed between 8.1 and 8.2 to change the docs?
None of the functions listed in:
http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
seems to support
aggregate(distinct exp [,exp])

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE

yeah but no function seems to support

aggregate(distinct x, y)

CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
SFUNC = sfunc,
STYPE = state_data_type
[ , FINALFUNC = ffunc ]
[ , INITCOND = initial_condition ]
[ , SORTOP = sort_operator ]
)

OK... but how am I going to implement an user defined aggregate that
support without resorting to C?

myaggfunc(distinct x, y)?

Otherwise to what is it referring

http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html

aggregate_name (DISTINCT expression [ , ... ] )

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#6David Fetter
david@fetter.org
In reply to: Ivan Sergio Borgonovo (#1)
Re: count (DISTINCT expression [ , ... ] ) and documentation

On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo wrote:

I noticed that starting from 8.2 the documentation at
http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
say that multiple distinct expressions are supported

aggregate_name (DISTINCT expression [, expression] )

In 8.4, you'll be able to do:

WITH d AS (
SELECT DISTINCT c1, c2 FROM table1
)
SELECT count(*) FROM d;

and very likely an OLAP version. :)

Cheers,
David.

While previous docs just listed one:

aggregate_name (DISTINCT expression)

Still I'm using 8.3 and

select count(distinct c1, c2) from table1;

report:

No function matches the given name and argument types. You might
need to add explicit type casts.

What should I write in spite of?

select count(distinct c1, c2) from table1;

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Ivan Sergio Borgonovo (#5)
Re: count (DISTINCT expression [ , ... ] ) and documentation

2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:

On Fri, 26 Dec 2008 16:23:52 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:

On Fri, 26 Dec 2008 15:46:48 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

count has only one argument,

then what was changed between 8.1 and 8.2 to change the docs?
None of the functions listed in:
http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
seems to support
aggregate(distinct exp [,exp])

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE

yeah but no function seems to support

aggregate(distinct x, y)

CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
SFUNC = sfunc,
STYPE = state_data_type
[ , FINALFUNC = ffunc ]
[ , INITCOND = initial_condition ]
[ , SORTOP = sort_operator ]
)

OK... but how am I going to implement an user defined aggregate that
support without resorting to C?

myaggfunc(distinct x, y)?

Otherwise to what is it referring

http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html

aggregate_name (DISTINCT expression [ , ... ] )

ok, I tested and it isn't supported yet. This is documentation bug.
DISTINCT is allowed only for single argument aggregate.

Regards
Pavel Stehule

Show quoted text

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: David Fetter (#6)
Re: count (DISTINCT expression [ , ... ] ) and documentation

On Fri, 26 Dec 2008 10:43:25 -0800
David Fetter <david@fetter.org> wrote:

On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo
wrote:

I noticed that starting from 8.2 the documentation at
http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
say that multiple distinct expressions are supported

aggregate_name (DISTINCT expression [, expression] )

In 8.4, you'll be able to do:

WITH d AS (
SELECT DISTINCT c1, c2 FROM table1
)
SELECT count(*) FROM d;

Nice, but what will be the difference from
select count(*) from (select distinct c1, c2 from t);
?
Optimisation?

Furthermore... I was actually looking at docs because I needed to
find a way supported by both postgresql and mysql and I've heard
that mysql is not that good at subselect and I doubt it supports
WITH AS. (OK not really a postgresql problem...).

Meanwhile what would you suggest as a general approach to stuff like

select count(distinct c1, c2) from t;

regardless of mysql support?
and considering mysql support?

I was thinking to find some way to exploit group by, but I didn't
come to anything useful yet.

and very likely an OLAP version. :)

What's "an OLAP version" of WITH d AS...

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#9David Fetter
david@fetter.org
In reply to: Ivan Sergio Borgonovo (#8)
Re: count (DISTINCT expression [ , ... ] ) and documentation

On Fri, Dec 26, 2008 at 08:03:30PM +0100, Ivan Sergio Borgonovo wrote:

On Fri, 26 Dec 2008 10:43:25 -0800
David Fetter <david@fetter.org> wrote:

On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo
wrote:

I noticed that starting from 8.2 the documentation at
http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
say that multiple distinct expressions are supported

aggregate_name (DISTINCT expression [, expression] )

In 8.4, you'll be able to do:

WITH d AS (
SELECT DISTINCT c1, c2 FROM table1
)
SELECT count(*) FROM d;

Nice, but what will be the difference from
select count(*) from (select distinct c1, c2 from t);
?
Optimisation?

None especially.

Furthermore... I was actually looking at docs because I needed to
find a way supported by both postgresql and mysql

Generally, it's *not* a good idea to try to support more than one
back-end. You wind up maintaining several disparate code bases, all
of which must do exactly the same thing, or you create your own RDBMS
in your client code, or worst of all, some of each.

Unless the most important attribute of the software, i.e. you can
jettison any other feature to support it, is to support more than one
RDBMS back-end, don't even try. Examples of software which needs to
support multiple RDBMS back-ends include, and are pretty much limited
to, ERD generators and migration tools.

and very likely an OLAP version. :)

What's "an OLAP version" of WITH d AS...

OLAP includes clauses like WINDOW() and OVER(), but since it's not
committed yet, I don't want to get too far into it :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#10Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: David Fetter (#9)
WITH AS vs subselect was: count (DISTINCT expression [ , ... ] ) and documentation

On Fri, 26 Dec 2008 12:04:48 -0800
David Fetter <david@fetter.org> wrote:

On Fri, Dec 26, 2008 at 08:03:30PM +0100, Ivan Sergio Borgonovo
wrote:

aggregate_name (DISTINCT expression [, expression] )

In 8.4, you'll be able to do:

WITH d AS (
SELECT DISTINCT c1, c2 FROM table1
)
SELECT count(*) FROM d;

Nice, but what will be the difference from
select count(*) from (select distinct c1, c2 from t);
?
Optimisation?

None especially.

So what would be the advantage compared to subselect?

Furthermore... I was actually looking at docs because I needed to
find a way supported by both postgresql and mysql

Generally, it's *not* a good idea to try to support more than one
back-end. You wind up maintaining several disparate code bases,

Not really my main target... I was just investigating if it could
come for free ;)

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ivan Sergio Borgonovo (#10)
Re: WITH AS vs subselect was: count (DISTINCT expression [ , ... ] ) and documentation

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:

David Fetter <david@fetter.org> wrote:

In 8.4, you'll be able to do:

WITH d AS (
SELECT DISTINCT c1, c2 FROM table1
)
SELECT count(*) FROM d;

Nice, but what will be the difference from
select count(*) from (select distinct c1, c2 from t);
?
Optimisation?

None especially.

So what would be the advantage compared to subselect?

None, David just has WITH on the brain ;-)

The subselect syntax certainly seems like the one most likely to work
across different SQL implementations. WITH is a pretty recent addition
to the SQL spec, and DISTINCT with multiple aggregate arguments isn't
in the spec at all. The COUNT(DISTINCT ROW(x,y)) hack is a cute idea
but I'm dubious that that's portable either (it certainly doesn't work
in pre-8.4 PG).

regards, tom lane

#12David Fetter
david@fetter.org
In reply to: Tom Lane (#11)
Re: WITH AS vs subselect was: count (DISTINCT expression [ , ... ] ) and documentation

On Fri, Dec 26, 2008 at 07:13:48PM -0500, Tom Lane wrote:

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:

David Fetter <david@fetter.org> wrote:

In 8.4, you'll be able to do:

WITH d AS (
SELECT DISTINCT c1, c2 FROM table1
)
SELECT count(*) FROM d;

Nice, but what will be the difference from
select count(*) from (select distinct c1, c2 from t);
?
Optimisation?

None especially.

So what would be the advantage compared to subselect?

None, David just has WITH on the brain ;-)

LOL!

You're only saying that because it's true ;)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Tom Lane (#11)
subselect and count (DISTINCT expression [ , ... ] ) performances

On Fri, 26 Dec 2008 19:13:48 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

The subselect syntax certainly seems like the one most likely to
work across different SQL implementations. WITH is a pretty

subselects actually works on mysql too but on a 1M table with about
300K unique columns it performs more than 4 times slower than
select (distinct a,b) from table

18sec vs. 4sec

Times were similar for innodb and myisam.

Postgresql needs 17sec with subselect.

I didn't try to see how both db could perform with indexes.

mysql performance is impressive. I thought that most of the time
would be spent on "distinct" where postgresql shouldn't suffer from
its "count" implementation. But well still 300K rows to count on 1M
aren't few.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it