Aggregate Function to return most common value for a column

Started by Michael Harrisalmost 17 years ago11 messagesgeneral
Jump to latest
#1Michael Harris
michael.harris@ericsson.com

Hi Experts,

I want to use an aggregate function that will return the most commonly
occurring value in a column.

The column consists of VARCHAR(32) values.

Is it possible to construct such an aggregate using PL/PgSql ?

If I was trying to do something like this in Perl I would use a hash
table to store the values and the number of times each was seen as the
table was iterated, but PL/PgSql does not seem to have an appropriate
data type for that?

I don't want to use PL/Perl to avoid the overhead of starting a perl
interpreter for that.

Do I have to write the function in C maybe?

Thanks in advance,
Regards
Mike Harris

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Michael Harris (#1)
Re: Aggregate Function to return most common value for a column

Hello

I thing, so the function in C is the best solution. And I thing, so
you can use PostgreSQL functionality inside. Pg support hash arrays
and hashing function too.

regards
Pavel Stehule

2009/5/22 Michael Harris <michael.harris@ericsson.com>:

Show quoted text

Hi Experts,

I want to use an aggregate function that will return the most commonly
occurring value in a column.

The column consists of VARCHAR(32) values.

Is it possible to construct such an aggregate using PL/PgSql ?

If I was trying to do something like this in Perl I would use a hash
table to store the values and the number of times each was seen as the
table was iterated, but PL/PgSql does not seem to have an appropriate
data type for that?

I don't want to use PL/Perl to avoid the overhead of starting a perl
interpreter for that.

Do I have to write the function in C maybe?

Thanks in advance,
Regards
Mike Harris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Michael Harris (#1)
Re: Aggregate Function to return most common value for a column

On Fri, 22 May 2009 17:48:44 +1000
"Michael Harris" <michael.harris@ericsson.com> wrote:

Hi Experts,

I want to use an aggregate function that will return the most
commonly occurring value in a column.

The column consists of VARCHAR(32) values.

Is it possible to construct such an aggregate using PL/PgSql ?

If I was trying to do something like this in Perl I would use a
hash table to store the values and the number of times each was
seen as the table was iterated, but PL/PgSql does not seem to have
an appropriate data type for that?

I don't want to use PL/Perl to avoid the overhead of starting a
perl interpreter for that.

Do I have to write the function in C maybe?

Isn't it a job for group by?

select count(*), myvalue from table group by myvalue order by
count(*) desc limit 1;

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#4Scott Bailey
artacus@comcast.net
In reply to: Michael Harris (#1)
Re: Aggregate Function to return most common value for a column

I want to use an aggregate function that will return the most commonly
occurring value in a column.

It's actually dead simple in Postgres. No C either. You just need to create an aggregate function.
I wrote a most() aggregate a while back that does exactly what you are asking for. Here, I'll add it to my blog...

http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/

Scott Bailey

#5David Fetter
david@fetter.org
In reply to: Scott Bailey (#4)
Re: Aggregate Function to return most common value for a column

On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:

I want to use an aggregate function that will return the most
commonly occurring value in a column.

It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function. I wrote a most() aggregate a while
back that does exactly what you are asking for. Here, I'll add it to
my blog...

http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/

Such an aggregate should probably be called, "mode," that being the
probability/statistics name for the concept.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#6Scott Bailey
artacus@comcast.net
In reply to: David Fetter (#5)
Re: Aggregate Function to return most common value for a column

On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:

I want to use an aggregate function that will return the most
commonly occurring value in a column.

It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function. I wrote a most() aggregate a while
back that does exactly what you are asking for. Here, I'll add it to
my blog...

http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/

Such an aggregate should probably be called, "mode," that being the
probability/statistics name for the concept.

Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text so statistics was in another box. I've update post with final functions for mode(), median() and range().

Scott

#7David Fetter
david@fetter.org
In reply to: Scott Bailey (#6)
Re: Aggregate Function to return most common value for a column

On Fri, May 22, 2009 at 05:23:47PM +0000, artacus@comcast.net wrote:

On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:

I want to use an aggregate function that will return the most
commonly occurring value in a column.

It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function. I wrote a most() aggregate a while
back that does exactly what you are asking for. Here, I'll add it to
my blog...

http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/

Such an aggregate should probably be called, "mode," that being the
probability/statistics name for the concept.

Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text so statistics was in another box. I've update post with final functions for mode(), median() and range().

Thanks! :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Scott Bailey (#6)
Re: Aggregate Function to return most common value for a column

artacus@comcast.net wrote:

Excellent observation Dave. Sometimes I can't see outside of the box I'm in.
And at the time I was focusing on text so statistics was in another box. I've
update post with final functions for mode(), median() and range().

Hey, if you want to add your functions to http://wiki.postgresql.org/wiki/Snippets ,
that would be great.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#9Leif B. Kristensen
leif@solumslekt.org
In reply to: Alvaro Herrera (#8)
Re: Aggregate Function to return most common value for a column

On Friday 22. May 2009, Alvaro Herrera wrote:

Hey, if you want to add your functions to
http://wiki.postgresql.org/wiki/Snippets , that would be great.

+1
-- 
Leif Biberg Kristensen | Registered Linux User #338009
Me And My Database: http://solumslekt.org/blog/
#10Michael Harris
michael.harris@ericsson.com
In reply to: Scott Bailey (#4)
Re: Aggregate Function to return most common value for a column

Hi Scott,

Brilliant, that's exactly what I wanted.

I guess the only thing that worries me is if the table being aggregated
is very large, I assume this solution will use a lot of memory - since
it creates an array containing all of the values in the target
expression - but I suspect in my application that won't be a problem.

Thanks again,

Regards // Mike

-----Original Message-----
From: artacus@comcast.net [mailto:artacus@comcast.net]
Sent: Saturday, 23 May 2009 1:23 AM
To: Michael Harris
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Aggregate Function to return most common value
for a column

I want to use an aggregate function that will return the most commonly

occurring value in a column.

It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function.
I wrote a most() aggregate a while back that does exactly what you are
asking for. Here, I'll add it to my blog...

http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggr
egates-most/

Scott Bailey

#11Benjamin Smith
ben@schoolpathways.com
In reply to: Michael Harris (#10)
Re: Aggregate Function to return most common value for a column

I've used this same concept in subqueries for a very long time. Doing this
allows me to "dive in" and get other values from the joined table, rather than
just the thing that we're getting the most of.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"I kept looking for somebody to solve the problem.
Then I realized... I am somebody!"

-- Author Unknown

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.