avg,first,last,median in one query

Started by Konstantin Knizhnikabout 10 years ago2 messageshackers
Jump to latest
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

Hi, hackers.

I need advice from SQL experts: is there any way in PostgreSQL to
calculate avg,first,last,median aggregates in one query?
Assume that we have the following table:

create table Securities ("Symbol" varchar, "Date" date, "Time" time,
"Price" real);

We can simulate median using percentile_disc:

select "Symbol","Date",
avg("Price"),
percentile_disc(0.5) within group (order by "Price")
from Securities
group by "Symbol","Date";

And all other aggregates can be calculated using windows functions:

select distinct "Symbol","Date",
first_value("Price") over (partition by "Symbol","Date" order by
"Time" rows between unbounded preceding and unbounded following),
last_value("Price") over (partition by "Symbol","Date" order by
"Time" rows between unbounded preceding and unbounded following),
avg("Price") over (partition by "Symbol","Date" rows between
unbounded preceding and unbounded following)
from Securities;

I wonder is there are any simpler/efficient alternative to the query above?

But unfortunately it is not possible to calculate median is such way
because percentile_disc is not compatible with OVER:

ERROR: OVER is not supported for ordered-set aggregate percentile_disc

So is there any chance to calculate all this four aggregates in one
query without writing some supplementary functions?

Additional question: what is the most efficient way of calculating
MEDIAN in PostgreSQL?
I found three different approaches:

1. Using CTE:

https://www.periscopedata.com/blog/medians-in-sql.html

2. Using user-defined aggregate function which uses array_appendand so
materialize all values in memory:

https://wiki.postgresql.org/wiki/Aggregate_Median

3. Using percentile aggregate:

http://blog.jooq.org/2015/01/06/how-to-emulate-the-median-aggregate-function-using-inverse-distribution-functions/

Thanks in advance,

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Konstantin Knizhnik (#1)
Re: avg,first,last,median in one query

On 3/24/16 9:00 AM, Konstantin Knizhnik wrote:

But unfortunately it is not possible to calculate median is such way
because percentile_disc is not compatible with OVER:

I don't know if you could use cume_dist()[1]http://www.postgresql.org/docs/9.5/static/functions-window.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com to do this, but even if you
can't it probably wouldn't be hard to modify it to do what you need.

[1]: http://www.postgresql.org/docs/9.5/static/functions-window.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers