Picking the first of an order in an aggregate query

Started by Robert Jamesover 13 years ago7 messagesgeneral

srobertjames@gmail.com

over 13 years ago

I have a query

SELECT grouping_field, MIN(field_a), MIN(field_b)
FROM ...
GROUP BY grouping_field

But, instead of picking the MIN field_a and MIN field_b, I'd like to
pick field_a and field_b from the first record, according to an order
I'll specify. In pseudo-SQL, it would be something like this:

SELECT grouping_field, FIRST(field_a), FIRST(field_b)
FROM ...
ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
GROUP BY grouping_field

How can I do that with Postgres?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jack Christensen

jack@jackchristensen.com

over 13 years ago

In reply to: Robert James (#1)

Re: Picking the first of an order in an aggregate query

On 12/31/2012 8:33 AM, Robert James wrote:

I have a query

SELECT grouping_field, MIN(field_a), MIN(field_b)
FROM ...
GROUP BY grouping_field

But, instead of picking the MIN field_a and MIN field_b, I'd like to
pick field_a and field_b from the first record, according to an order
I'll specify. In pseudo-SQL, it would be something like this:

SELECT grouping_field, FIRST(field_a), FIRST(field_b)
FROM ...
ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
GROUP BY grouping_field

How can I do that with Postgres?

select distinct on (grouping_field), field_a, field_b
from ...
order by grouping_field, field_a asc, field_b asc

http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-DISTINCT

--
Jack Christensen
http://jackchristensen.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Jack Christensen (#2)

Re: Picking the first of an order in an aggregate query

Jack Christensen <jack@jackchristensen.com> writes:

On 12/31/2012 8:33 AM, Robert James wrote:

SELECT grouping_field, FIRST(field_a), FIRST(field_b)
FROM ...
ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
GROUP BY grouping_field

How can I do that with Postgres?

select distinct on (grouping_field), field_a, field_b
from ...
order by grouping_field, field_a asc, field_b asc

Another possibility, if you're using PG 8.4 or newer, is to use window
functions. It'd go something like

select grouping_field, first_value(field_a) over (partition by grouping_field order by field_a), ...

if memory serves (I'm not quite sure whether you need the PARTITION BY
bit if there's a global GROUP BY in the query).

The DISTINCT ON syntax is a Postgres-ism, while window functions are
SQL-standard so have at least some chance of being portable, if that
matters to you.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Robert James

srobertjames@gmail.com

over 13 years ago

In reply to: Jack Christensen (#2)

Re: Picking the first of an order in an aggregate query

DISTINCT is a very simple solution!
But I have one problem: In addition to the FIRST fields, I also do
want some aggregate functions. More accurately, it would be:

SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
MAX(field_y)
...

How should I do that? Should I do two queries with a join on the
grouping field? Or is there a more direct way?

On 12/31/12, Jack Christensen <jack@jackchristensen.com> wrote:

On 12/31/2012 8:33 AM, Robert James wrote:

I have a query

SELECT grouping_field, MIN(field_a), MIN(field_b)
FROM ...
GROUP BY grouping_field

But, instead of picking the MIN field_a and MIN field_b, I'd like to
pick field_a and field_b from the first record, according to an order
I'll specify. In pseudo-SQL, it would be something like this:

SELECT grouping_field, FIRST(field_a), FIRST(field_b)
FROM ...
ORDER BY field_c DESC, field_d ASC, myfunc(field_e) ASC
GROUP BY grouping_field

How can I do that with Postgres?

select distinct on (grouping_field), field_a, field_b
from ...
order by grouping_field, field_a asc, field_b asc

http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-DISTINCT

--
Jack Christensen
http://jackchristensen.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

François Beausoleil

francois@teksol.info

over 13 years ago

In reply to: Robert James (#4)

Re: Picking the first of an order in an aggregate query

Le 2012-12-31 à 15:38, Robert James a écrit :

DISTINCT is a very simple solution!
But I have one problem: In addition to the FIRST fields, I also do
want some aggregate functions. More accurately, it would be:

SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
MAX(field_y)
...

How should I do that? Should I do two queries with a join on the
grouping field? Or is there a more direct way?

WINDOW functions can help you:

SELECT
grouping_field
, first_value(field_a) OVER (ORDER BY ...)
, first_value(field_b) OVER (ORDER BY ...)
, sum(field_x) OVER ()
, max(field_y) OVER ()
FROM ...

The empty OVER clauses will make the sum / max work over the full result set, and not a subset. I really recommend reading the window functions section on the site.

Bye!
François

NOTE: Please do not top-post. This list is bottom post.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Robert James

srobertjames@gmail.com

over 13 years ago

In reply to: François Beausoleil (#5)

Re: Picking the first of an order in an aggregate query

On 12/31/12, François Beausoleil <francois@teksol.info> wrote:

Le 2012-12-31 à 15:38, Robert James a écrit :

DISTINCT is a very simple solution!
But I have one problem: In addition to the FIRST fields, I also do
want some aggregate functions. More accurately, it would be:

SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
MAX(field_y)
...

How should I do that? Should I do two queries with a join on the
grouping field? Or is there a more direct way?

WINDOW functions can help you:

SELECT
grouping_field
, first_value(field_a) OVER (ORDER BY ...)
, first_value(field_b) OVER (ORDER BY ...)
, sum(field_x) OVER ()
, max(field_y) OVER ()
FROM ...

The empty OVER clauses will make the sum / max work over the full result
set, and not a subset. I really recommend reading the window functions
section on the site.

I see. Will the optimizer know enough to not repeat the work for each
first_value I do? Or am I better off using a JOIN of some sort?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Vincent Veyron

vv.lists@wanadoo.fr

over 13 years ago

In reply to: Robert James (#6)

Re: Picking the first of an order in an aggregate query

Le lundi 31 dï¿œcembre 2012 ï¿œ 20:55 -0500, Robert James a ï¿œcrit :

On 12/31/12, Franï¿œois Beausoleil <francois@teksol.info> wrote:

Le 2012-12-31 ï¿œ 15:38, Robert James a ï¿œcrit :

DISTINCT is a very simple solution!
But I have one problem: In addition to the FIRST fields, I also do
want some aggregate functions. More accurately, it would be:

SELECT grouping_field, FIRST(field_a), FIRST(field_b), SUM(field_x),
MAX(field_y)
...

How should I do that? Should I do two queries with a join on the
grouping field? Or is there a more direct way?

WINDOW functions can help you:

SELECT
grouping_field
, first_value(field_a) OVER (ORDER BY ...)
, first_value(field_b) OVER (ORDER BY ...)
, sum(field_x) OVER ()
, max(field_y) OVER ()
FROM ...

The empty OVER clauses will make the sum / max work over the full result
set, and not a subset. I really recommend reading the window functions
section on the site.

I see. Will the optimizer know enough to not repeat the work for each
first_value I do? Or am I better off using a JOIN of some sort?

You probably can check with explain analyze; if not, a CTE (common table
expression) might help; something like :

with t1 as (
select grouping_field, sum(field_x) as sum_x, max(field_y) as max_y
FROM ...
group by grouping_field
)
SELECT grouping_field,
first_value(field_a) OVER (Partition by grouping_field ORDER BY ...),
first_value(field_b) OVER (Partition by grouping_field ORDER BY ...),
t1.sum_x,
t1.max_y
FROM ... INNER JOIN t1 using (grouping_field)

'Partition by grouping_field' may or may not be necessary in your case,
depending on what you want; see :

http://www.postgresql.org/docs/current/static/tutorial-window.html

--
Vincent Veyron
http://marica.fr
Logiciel pour dï¿œpartement juridique

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general