Single Table Select With Aggregate Function

Started by Rich Shepardover 14 years ago6 messagesgeneral
Jump to latest
#1Rich Shepard
rshepard@appl-ecosys.com

I'm probably not seeing the obvious so I keep making the same mistake. The
table holds water chemistry data from multiple streams, sites within each
stream, sampling dates, and many chemical constituents.

What I need to do are three things:

1.) Find the date and site for the maximum value of a specified constituent
on a named stream.

2.) Find the values of that same constituent at other sites on the named
stream on that same date.

3.) Find the date of the maximum value of a constituent for all sites on
the named stream.

Thinking the last the easiest to do, I submitted this query:

select max(quant), site, sampdate from chemistry where stream = 'SheepCrk'
and param = 'TDS' group by site, sampdate;

but this gives me the value of each site and date, not the maximum for all
dates at a specific site. Postgres tells me that both site and sampdate must
be in the 'group by' clause.

Obviously the syntax is more complex than I thought it would be and I
would like a pointer to the correct way to write these queries.

Rich

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rich Shepard (#1)
Re: Single Table Select With Aggregate Function

Rich Shepard <rshepard@appl-ecosys.com> writes:

What I need to do are three things:

1.) Find the date and site for the maximum value of a specified constituent
on a named stream.

2.) Find the values of that same constituent at other sites on the named
stream on that same date.

3.) Find the date of the maximum value of a constituent for all sites on
the named stream.

You can do that type of thing using subqueries, eg

select ... from mytab
where col = (select max(col) from mytab where ...)

or if you don't mind a nonstandard construct, consider SELECT DISTINCT ON.
There's an example of how to use that in the SELECT reference page
(look for the weather-reports example). In any case you're going to
have to consider what you want to have happen when there are multiple
occurrences of the maximum value.

regards, tom lane

#3Ondrej Ivanič
ondrej.ivanic@gmail.com
In reply to: Rich Shepard (#1)
Re: Single Table Select With Aggregate Function

Hi,

On 4 January 2012 10:26, Rich Shepard <rshepard@appl-ecosys.com> wrote:

select max(quant), site, sampdate from chemistry where stream = 'SheepCrk'
and param = 'TDS' group by site, sampdate;

but this gives me the value of each site and date, not the maximum for all
dates at a specific site. Postgres tells me that both site and sampdate must
be in the 'group by' clause.

 Obviously the syntax is more complex than I thought it would be and I
would like a pointer to the correct way to write these queries.

window functions might be helpful:
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

p4=> select * from public.chemistry order by sampdate, site, quant desc;
quant | site | sampdate
-------+------+------------
100 | 1 | 2012-01-03
80 | 1 | 2012-01-03
30 | 1 | 2012-01-03
400 | 2 | 2012-01-03
100 | 2 | 2012-01-03
30 | 2 | 2012-01-03
100 | 3 | 2012-01-03
80 | 3 | 2012-01-03
30 | 3 | 2012-01-03
80 | 4 | 2012-01-03
(10 rows)

p4=> select distinct first_value(quant) over (partition by sampdate,
site order by quant desc), sampdate, site from public.chemistry order
by site;
first_value | sampdate | site
-------------+------------+------
100 | 2012-01-03 | 1
400 | 2012-01-03 | 2
100 | 2012-01-03 | 3
80 | 2012-01-03 | 4
(4 rows)

--
Ondrej

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Rich Shepard (#1)
Re: Single Table Select With Aggregate Function

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rich Shepard
Sent: Tuesday, January 03, 2012 6:27 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Single Table Select With Aggregate Function

I'm probably not seeing the obvious so I keep making the same mistake.
The table holds water chemistry data from multiple streams, sites within
each stream, sampling dates, and many chemical constituents.

What I need to do are three things:

1.) Find the date and site for the maximum value of a specified
constituent on a named stream.

2.) Find the values of that same constituent at other sites on the named
stream on that same date.

3.) Find the date of the maximum value of a constituent for all sites on
the named stream.

Thinking the last the easiest to do, I submitted this query:

select max(quant), site, sampdate from chemistry where stream = 'SheepCrk'
and param = 'TDS' group by site, sampdate;

but this gives me the value of each site and date, not the maximum for all
dates at a specific site. Postgres tells me that both site and sampdate must
be in the 'group by' clause.

Obviously the syntax is more complex than I thought it would be and I
would like a pointer to the correct way to write these queries.

Rich

---------------------------------------------------------------

-- Base Query; most recent date for the maximum quant on a given site/stream
(not tested)
WITH
get_max_quant_for_each_site AS (
SELCET max(quant) AS site_maximum, site, stream, param FROM
chemistry WHERE stream = 'SheepCrk' AND param = 'TDS' GROUP BY site, stream,
param
)
, locate_all_entries_with_the_max AS (
SELECT site, sampledate, quant, stream
, ROW_NUMBER() OVER (PARTITION BY site, stream, param ORDER
BY sampledate DESC) AS sample_row_number
FROM (SELECT site, quant, sampledate, stream FROM chemistry)
JOIN get_max_quant_for_each_site USING (site, stream, param)
WHERE quant = site_maximum
)
, limit_to_one_per_site AS (
SELECT site, sampledate, quant, stream, param
FROM locate_all_entries_with_the_max
WHERE sample_row_number = 1
)
SELECT * FROM limit_to_one_per_site;

-- The other queries should be able to manipulate the results of this query
to directly answer your question or use these results JOINed against the
chemistry table to get at what you need
-- My naming was done before deciding that you want more than just "site" in
joining/output but also "stream" and "param"
-- You can convert the "WITH" clauses into sub-queries to possibly improve
performance...though WITH is probably easier to follow along with during
initial development

David J.

#5Rich Shepard
rshepard@appl-ecosys.com
In reply to: Tom Lane (#2)
Re: Single Table Select With Aggregate Function

On Tue, 3 Jan 2012, Tom Lane wrote:

You can do that type of thing using subqueries, eg

select ... from mytab
where col = (select max(col) from mytab where ...)

Thanks, Tom. That's what I thought I needed.

or if you don't mind a nonstandard construct, consider SELECT DISTINCT ON.
There's an example of how to use that in the SELECT reference page
(look for the weather-reports example). In any case you're going to
have to consider what you want to have happen when there are multiple
occurrences of the maximum value.

I'll look at that, too.

Much appreciated,

Rich

#6Rich Shepard
rshepard@appl-ecosys.com
In reply to: Ondrej Ivanič (#3)
Re: Single Table Select With Aggregate Function

On Wed, 4 Jan 2012, Ondrej Ivanič wrote:

window functions might be helpful:
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

Thanks. I'll carefully read this.

Much appreciated,

Rich