median for postgresql 8.3

Started by maartenover 15 years ago6 messagesgeneral
Jump to latest
#1maarten
maarten.foque@edchq.com

Hello everyone,

I was doing some analysis of data to find average delays between some
timestamp values etc...
When the number of rows the average is computed over is small, this can
give distorted values. So I've obviously added a count column to see if
the average represents much data.
However, I would also like to add the median value to give me a pretty
good idea of whats happening even for smaller counts.

I couldn't find such an aggregate function in the manual (version 8.3)
and some websearching didn't uncover it either.

I was thinking about
SELECT max(id) FROM test ORDER BY id ASC LIMIT
(SELECT count(*)/2 FROM test)

But two things are wrong with that:
Limit can't use subqueries :(
And ORDER BY gives me the error: 'must be used in aggregate function
etc...) but I can probably work around this by using an ordered subquery
in stead of the table directly.

Furthermore, I need the median for a timestamp column, which would
probably complicate things more than when it is a number column.

I'd like to be able to do this using only the database. (So no
programming functions, special addons etc...)

Any ideas anyone?

regards,
Maarten

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: maarten (#1)
Re: median for postgresql 8.3

Hello

see http://okbob.blogspot.com/2009/11/aggregate-function-median-in-postgresql.html

Regards

Pavel Stehule

2010/11/16 maarten <maarten.foque@edchq.com>:

Show quoted text

Hello everyone,

I was doing some analysis of data to find average delays between some
timestamp values etc...
When the number of rows the average is computed over is small, this can
give distorted values.  So I've obviously added a count column to see if
the average represents much data.
However, I would also like to add the median value to give me a pretty
good idea of whats happening even for smaller counts.

I couldn't find such an aggregate function in the manual (version 8.3)
and some websearching didn't uncover it either.

I was thinking about
SELECT max(id) FROM test ORDER BY id ASC LIMIT
       (SELECT count(*)/2 FROM test)

But two things are wrong with that:
Limit can't use subqueries :(
And ORDER BY gives me the error: 'must be used in aggregate function
etc...) but I can probably work around this by using an ordered subquery
in stead of the table directly.

Furthermore, I need the median for a timestamp column, which would
probably complicate things more than when it is a number column.

I'd like to be able to do this using only the database. (So no
programming functions, special addons etc...)

Any ideas anyone?

regards,
Maarten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Brent Wood
b.wood@niwa.co.nz
In reply to: Pavel Stehule (#2)
Re: median for postgresql 8.3

Hi Maarten,

The best way I know of to do this is not to do statistical queries "in" the DB at all, but use a stats capability embedded in your database, so they still appear to the user to be done in the db. I don't see how you can easily get the functionality you want without user defined functions or addons, While PL/R is a "special addon", and you created a custom median function to do this, there are very good instructions to follow to do this.

I think it might be worth your while if you are looking to retrieve stats from SQL queries.

See PL/R, and the median how-to at:
http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgresql_plr_tut01

HTH,

Brent Wood

Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand

maarten 11/17/10 9:15 AM >>>

Hello everyone,

I was doing some analysis of data to find average delays between some
timestamp values etc...
When the number of rows the average is computed over is small, this can
give distorted values. So I've obviously added a count column to see if
the average represents much data.
However, I would also like to add the median value to give me a pretty
good idea of whats happening even for smaller counts.

I couldn't find such an aggregate function in the manual (version 8.3)
and some websearching didn't uncover it either.

I was thinking about
SELECT max(id) FROM test ORDER BY id ASC LIMIT
(SELECT count(*)/2 FROM test)

But two things are wrong with that:
Limit can't use subqueries :(
And ORDER BY gives me the error: 'must be used in aggregate function
etc...) but I can probably work around this by using an ordered subquery
in stead of the table directly.

Furthermore, I need the median for a timestamp column, which would
probably complicate things more than when it is a number column.

I'd like to be able to do this using only the database. (So no
programming functions, special addons etc...)

Any ideas anyone?

regards,
Maarten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Please consider the environment before printing this email.

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.

#4maarten
maarten.foque@edchq.com
In reply to: Brent Wood (#3)
Re: median for postgresql 8.3

Hello Brent,

well I said "using only the db" because this is a one time need and just
wanted to avoid crufting around with calc, and doing 'manual' work.

But that seems to be the fastest approach now.

When it becomes a frequent need I'll probably end up doing what you
suggest, or upgrade to 9.x, for which I've found simple examples due to
dynamic limit and offset.

thanks and regards,
Maarten

Show quoted text

On Wed, 2010-11-17 at 09:53 +1300, Brent Wood wrote:

Hi Maarten,

The best way I know of to do this is not to do statistical queries
"in" the DB at all, but use a stats capability embedded in your
database, so they still appear to the user to be done in the db. I
don't see how you can easily get the functionality you want without
user defined functions or addons, While PL/R is a "special addon", and
you created a custom median function to do this, there are very good
instructions to follow to do this.

I think it might be worth your while if you are looking to retrieve
stats from SQL queries.

See PL/R, and the median how-to at:
http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgresql_plr_tut01

HTH,

Brent Wood

Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand

maarten <maarten.foque@edchq.com> 11/17/10 9:15 AM >>>

Hello everyone,

I was doing some analysis of data to find average delays between some
timestamp values etc...
When the number of rows the average is computed over is small, this
can
give distorted values. So I've obviously added a count column to see
if
the average represents much data.
However, I would also like to add the median value to give me a pretty
good idea of whats happening even for smaller counts.

I couldn't find such an aggregate function in the manual (version 8.3)
and some websearching didn't uncover it either.

I was thinking about
SELECT max(id) FROM test ORDER BY id ASC LIMIT
(SELECT count(*)/2 FROM test)

But two things are wrong with that:
Limit can't use subqueries :(
And ORDER BY gives me the error: 'must be used in aggregate function
etc...) but I can probably work around this by using an ordered
subquery
in stead of the table directly.

Furthermore, I need the median for a timestamp column, which would
probably complicate things more than when it is a number column.

I'd like to be able to do this using only the database. (So no
programming functions, special addons etc...)

Any ideas anyone?

regards,
Maarten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Please consider the environment before printing this email.

NIWA is the trading name of the National Institute of Water &
Atmospheric Research Ltd.

#5Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Pavel Stehule (#2)
Re: median for postgresql 8.3

On 16 November 2010 17:37, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

see http://okbob.blogspot.com/2009/11/aggregate-function-median-in-postgresql.html

An 8.3-compatible way of doing it is:

SELECT CASE WHEN c % 2 = 0 AND c > 1 THEN (a[1]+a[2])/2 ELSE a[1] END
FROM
(
SELECT ARRAY(SELECT a FROM milrows ORDER BY a OFFSET (c-1)/2 LIMIT 2) AS a, c
FROM (SELECT count(*) AS c FROM milrows) AS count
OFFSET 0
)
AS midrows;

In my tests this is faster than the analytic and array-based methods,
but not by a huge amount.

Regards,
Dean

Show quoted text

Regards

Pavel Stehule

2010/11/16 maarten <maarten.foque@edchq.com>:

Hello everyone,

I was doing some analysis of data to find average delays between some
timestamp values etc...
When the number of rows the average is computed over is small, this can
give distorted values.  So I've obviously added a count column to see if
the average represents much data.
However, I would also like to add the median value to give me a pretty
good idea of whats happening even for smaller counts.

I couldn't find such an aggregate function in the manual (version 8.3)
and some websearching didn't uncover it either.

I was thinking about
SELECT max(id) FROM test ORDER BY id ASC LIMIT
       (SELECT count(*)/2 FROM test)

But two things are wrong with that:
Limit can't use subqueries :(
And ORDER BY gives me the error: 'must be used in aggregate function
etc...) but I can probably work around this by using an ordered subquery
in stead of the table directly.

Furthermore, I need the median for a timestamp column, which would
probably complicate things more than when it is a number column.

I'd like to be able to do this using only the database. (So no
programming functions, special addons etc...)

Any ideas anyone?

regards,
Maarten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Dean Rasheed (#5)
Re: median for postgresql 8.3

2010/11/17 Dean Rasheed <dean.a.rasheed@gmail.com>:

On 16 November 2010 17:37, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

see http://okbob.blogspot.com/2009/11/aggregate-function-median-in-postgresql.html

An 8.3-compatible way of doing it is:

SELECT CASE WHEN c % 2 = 0 AND c > 1 THEN (a[1]+a[2])/2 ELSE a[1] END
FROM
(
 SELECT ARRAY(SELECT a FROM milrows ORDER BY a OFFSET (c-1)/2 LIMIT 2) AS a, c
  FROM (SELECT count(*) AS c FROM milrows) AS count
 OFFSET 0
)
AS midrows;

nice :)

Pavel

Show quoted text

In my tests this is faster than the analytic and array-based methods,
but not by a huge amount.

Regards,
Dean

Regards

Pavel Stehule

2010/11/16 maarten <maarten.foque@edchq.com>:

Hello everyone,

I was doing some analysis of data to find average delays between some
timestamp values etc...
When the number of rows the average is computed over is small, this can
give distorted values.  So I've obviously added a count column to see if
the average represents much data.
However, I would also like to add the median value to give me a pretty
good idea of whats happening even for smaller counts.

I couldn't find such an aggregate function in the manual (version 8.3)
and some websearching didn't uncover it either.

I was thinking about
SELECT max(id) FROM test ORDER BY id ASC LIMIT
       (SELECT count(*)/2 FROM test)

But two things are wrong with that:
Limit can't use subqueries :(
And ORDER BY gives me the error: 'must be used in aggregate function
etc...) but I can probably work around this by using an ordered subquery
in stead of the table directly.

Furthermore, I need the median for a timestamp column, which would
probably complicate things more than when it is a number column.

I'd like to be able to do this using only the database. (So no
programming functions, special addons etc...)

Any ideas anyone?

regards,
Maarten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general