user defined aggregate for percentile calculations

Started by Kashmirabout 17 years ago5 messagesgeneral

kashmir_us_1999@yahoo.com

about 17 years ago

cant seem to find anything about this,
would it possible at all to create a percentile-aggregate in pgres?
any pointers?
tia
-k

A. Kretschmer

andreas.kretschmer@schollglas.com

about 17 years ago

In reply to: Kashmir (#1)

Re: user defined aggregate for percentile calculations

In response to Kashmir :

cant seem to find anything about this,
would it possible at all to create a percentile-aggregate in pgres?
any pointers?
tia
-k

I'm not sure if i understand your problem, but how about:

test=*# select * from percentile ;
id | value
----+-------
1 | 10
2 | 20
3 | 30
4 | 50
(4 rows)

test=*# select a.id, a.value, (a.value*100/foo.summe)::numeric(10,2)
from percentile a, (select sum(value) as summe from percentile ) as foo
group by a.id, a.value, foo.summe order by id;
id | value | numeric
----+-------+---------
1 | 10 | 9.00
2 | 20 | 18.00
3 | 30 | 27.00
4 | 50 | 45.00
(4 rows)

HTH, Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

Kashmir

kashmir_us_1999@yahoo.com

about 17 years ago

In reply to: A. Kretschmer (#2)

Re: user defined aggregate for percentile calculations

thx for anwering!,

i was more looking for a percentile calculation as in http://en.wikipedia.org/wiki/Percentile
usually i do this in perl with the 'Statistics::Descriptive' module,
but it would help me alot if i could solve this within the sql query,
and would not have to push all the data into a perl arrays first..

so i could e.g. simply say:
select 95percentile(column with many many values) from sometable
actually what i really need is something like:
select 95p(col), 98p(col), max(col), min(col), avg(col) from thetable...
that would save me alot of cycles i'm takin in perl at the moment...
guess my sql is not smart enough, but was thinking a user defined function would be of great use :-)

any more suggestions?

tia!
-k

----- Original Message ----
From: A. Kretschmer <andreas.kretschmer@schollglas.com>
To: pgsql-general@postgresql.org
Sent: Monday, April 6, 2009 4:53:28 AM
Subject: Re: [GENERAL] user defined aggregate for percentile calculations

In response to Kashmir :

cant seem to find anything about this,
would it possible at all to create a percentile-aggregate in pgres?
any pointers?
tia
-k

I'm not sure if i understand your problem, but how about:

test=*# select * from percentile ;
id | value
----+-------
1 | 10
2 | 20
3 | 30
4 | 50
(4 rows)

HTH, Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Alvaro Herrera

alvherre@2ndquadrant.com

about 17 years ago

In reply to: Kashmir (#3)

Re: user defined aggregate for percentile calculations

Kashmir wrote:

thx for anwering!,

i was more looking for a percentile calculation as in http://en.wikipedia.org/wiki/Percentile
usually i do this in perl with the 'Statistics::Descriptive' module,
but it would help me alot if i could solve this within the sql query,
and would not have to push all the data into a perl arrays first..

Have you considered writing a function in the PL/R language? It has
percentiles (actually quantiles) and all sorts of useful math stuff.

See http://www.joeconway.com/plr/

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Greg Smith

gsmith@gregsmith.com

about 17 years ago

In reply to: Kashmir (#1)

Re: user defined aggregate for percentile calculations

On Sun, 5 Apr 2009, Kashmir wrote:

would it possible at all to create a percentile-aggregate in pgres?

I normally just do this right in the database without specifically
accelerating it with an aggregate. Not very efficient but it works fine
for reasonably sized data sets that fit into the database cache. Here's
an example that computes some statistics about temporary data in a table
named "timing" into a summary statistics table named tests:

update tests set trans=(select count(*) from timing);
update tests set
avg_latency=(select avg(latency) from timing),
max_latency=(select max(latency) from timing),
percentile_90_latency=
(select latency from timing
order by latency offset (round(0.90*trans)) limit 1);

Even if that's not efficient enough for your final app, you might use that
sort of thing as a prototype until you get a better implementation, rather
than dropping into Perl.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD