Aggregate Function to return most common value for a column
Hi Experts,
I want to use an aggregate function that will return the most commonly
occurring value in a column.
The column consists of VARCHAR(32) values.
Is it possible to construct such an aggregate using PL/PgSql ?
If I was trying to do something like this in Perl I would use a hash
table to store the values and the number of times each was seen as the
table was iterated, but PL/PgSql does not seem to have an appropriate
data type for that?
I don't want to use PL/Perl to avoid the overhead of starting a perl
interpreter for that.
Do I have to write the function in C maybe?
Thanks in advance,
Regards
Mike Harris
Hello
I thing, so the function in C is the best solution. And I thing, so
you can use PostgreSQL functionality inside. Pg support hash arrays
and hashing function too.
regards
Pavel Stehule
2009/5/22 Michael Harris <michael.harris@ericsson.com>:
Show quoted text
Hi Experts,
I want to use an aggregate function that will return the most commonly
occurring value in a column.The column consists of VARCHAR(32) values.
Is it possible to construct such an aggregate using PL/PgSql ?
If I was trying to do something like this in Perl I would use a hash
table to store the values and the number of times each was seen as the
table was iterated, but PL/PgSql does not seem to have an appropriate
data type for that?I don't want to use PL/Perl to avoid the overhead of starting a perl
interpreter for that.Do I have to write the function in C maybe?
Thanks in advance,
Regards
Mike Harris--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, 22 May 2009 17:48:44 +1000
"Michael Harris" <michael.harris@ericsson.com> wrote:
Hi Experts,
I want to use an aggregate function that will return the most
commonly occurring value in a column.The column consists of VARCHAR(32) values.
Is it possible to construct such an aggregate using PL/PgSql ?
If I was trying to do something like this in Perl I would use a
hash table to store the values and the number of times each was
seen as the table was iterated, but PL/PgSql does not seem to have
an appropriate data type for that?I don't want to use PL/Perl to avoid the overhead of starting a
perl interpreter for that.Do I have to write the function in C maybe?
Isn't it a job for group by?
select count(*), myvalue from table group by myvalue order by
count(*) desc limit 1;
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
I want to use an aggregate function that will return the most commonly
occurring value in a column.
It's actually dead simple in Postgres. No C either. You just need to create an aggregate function.
I wrote a most() aggregate a while back that does exactly what you are asking for. Here, I'll add it to my blog...
http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/
Scott Bailey
On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:
I want to use an aggregate function that will return the most
commonly occurring value in a column.It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function. I wrote a most() aggregate a while
back that does exactly what you are asking for. Here, I'll add it to
my blog...http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/
Such an aggregate should probably be called, "mode," that being the
probability/statistics name for the concept.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:
I want to use an aggregate function that will return the most
commonly occurring value in a column.It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function. I wrote a most() aggregate a while
back that does exactly what you are asking for. Here, I'll add it to
my blog...http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/
Such an aggregate should probably be called, "mode," that being the
probability/statistics name for the concept.
Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text so statistics was in another box. I've update post with final functions for mode(), median() and range().
Scott
On Fri, May 22, 2009 at 05:23:47PM +0000, artacus@comcast.net wrote:
On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:
I want to use an aggregate function that will return the most
commonly occurring value in a column.It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function. I wrote a most() aggregate a while
back that does exactly what you are asking for. Here, I'll add it to
my blog...http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/
Such an aggregate should probably be called, "mode," that being the
probability/statistics name for the concept.Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text so statistics was in another box. I've update post with final functions for mode(), median() and range().
Thanks! :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
artacus@comcast.net wrote:
Excellent observation Dave. Sometimes I can't see outside of the box I'm in.
And at the time I was focusing on text so statistics was in another box. I've
update post with final functions for mode(), median() and range().
Hey, if you want to add your functions to http://wiki.postgresql.org/wiki/Snippets ,
that would be great.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Friday 22. May 2009, Alvaro Herrera wrote:
Hey, if you want to add your functions to
http://wiki.postgresql.org/wiki/Snippets , that would be great.
+1
--
Leif Biberg Kristensen | Registered Linux User #338009
Me And My Database: http://solumslekt.org/blog/
Hi Scott,
Brilliant, that's exactly what I wanted.
I guess the only thing that worries me is if the table being aggregated
is very large, I assume this solution will use a lot of memory - since
it creates an array containing all of the values in the target
expression - but I suspect in my application that won't be a problem.
Thanks again,
Regards // Mike
-----Original Message-----
From: artacus@comcast.net [mailto:artacus@comcast.net]
Sent: Saturday, 23 May 2009 1:23 AM
To: Michael Harris
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Aggregate Function to return most common value
for a column
I want to use an aggregate function that will return the most commonly
occurring value in a column.
It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function.
I wrote a most() aggregate a while back that does exactly what you are
asking for. Here, I'll add it to my blog...
http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggr
egates-most/
Scott Bailey
I've used this same concept in subqueries for a very long time. Doing this
allows me to "dive in" and get other values from the joined table, rather than
just the thing that we're getting the most of.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"I kept looking for somebody to solve the problem.
Then I realized... I am somebody!"
-- Author Unknown
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.