Cumulative count (running total) window fn

Started by Oliver Kohll - Mailing Listsalmost 16 years ago8 messagesgeneral
Jump to latest
#1Oliver Kohll - Mailing Lists
oliver.lists@gtwm.co.uk

Hello,

Many thanks to andreas.kretschmer for this helpful reply about how to set up a window function to perform a running total:
http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php

It works perfectly with the simple test data but I've just got back to work, tried implementing it on my live data and the results are slightly different. My query is almost exactly the same - I've simplified by grouping by year only rather than year and month:

select extract(year from signup_date),
count(email_address),
sum(count(email_address)) over (rows unbounded preceding)
from email_list group by 1 order by 1;

date_part | count | sum
-----------+-------+------
2007 | 501 | 1374
2008 | 491 | 491
2009 | 382 | 873
2010 | 66 | 1440
(4 rows)

What I'm looking for is
date_part | count | sum
-----------+-------+------
2007 | 501 | 501
2008 | 491 | 992
2009 | 382 | 1374
2010 | 66 | 1440

It seems to be adding up the counts but not in the right order.

I've also tried an explicit ORDER BY inside the partition with no difference:

select extract(year from signup_date),
count(email_address),
sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)
from email_list group by 1 order by 1;

Does anyone have any other ideas?

Regards
Oliver Kohll

oliver@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company

#2Magnus Hagander
magnus@hagander.net
In reply to: Oliver Kohll - Mailing Lists (#1)
Re: Cumulative count (running total) window fn

On Thu, Apr 29, 2010 at 10:52, Oliver Kohll - Mailing Lists
<oliver.lists@gtwm.co.uk> wrote:

Hello,

Many thanks to andreas.kretschmer for this helpful reply about how to set up a window function to perform a running total:
http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php

It works perfectly with the simple test data but I've just got back to work, tried implementing it on my live data and the results are slightly different. My query is almost exactly the same - I've simplified by grouping by year only rather than year and month:

select extract(year from signup_date),
 count(email_address),
 sum(count(email_address)) over (rows unbounded preceding)
from email_list group by 1 order by 1;

 date_part | count | sum
-----------+-------+------
     2007 |   501 | 1374
     2008 |   491 |  491
     2009 |   382 |  873
     2010 |    66 | 1440
(4 rows)

What I'm looking for is
 date_part | count | sum
-----------+-------+------
     2007 |   501 | 501
     2008 |   491 |  992
     2009 |   382 |  1374
     2010 |    66 | 1440

It seems to be adding up the counts but not in the right order.

I've also tried an explicit ORDER BY inside the partition with no difference:

select extract(year from signup_date),
 count(email_address),
 sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)
from email_list group by 1 order by 1;

Does anyone have any other ideas?

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#3Oliver Kohll - Mailing Lists
oliver.lists@gtwm.co.uk
In reply to: Magnus Hagander (#2)
Re: Cumulative count (running total) window fn

On 29 Apr 2010, at 10:01, Magnus Hagander wrote:

select extract(year from signup_date),
count(email_address),
sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded preceding)
from email_list group by 1 order by 1;

Does anyone have any other ideas?

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.

Oliver Kohll

#4Thom Brown
thombrown@gmail.com
In reply to: Oliver Kohll - Mailing Lists (#3)
Re: Cumulative count (running total) window fn

On 29 April 2010 11:39, Oliver Kohll - Mailing Lists <
oliver.lists@gtwm.co.uk> wrote:

On 29 Apr 2010, at 10:01, Magnus Hagander wrote:

select extract(year from signup_date),

count(email_address),

sum(count(email_address)) over (partition by 1 order by 1 asc rows
unbounded preceding)

from email_list group by 1 order by 1;

Does anyone have any other ideas?

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like
to know if it's possible to do with a window function rather than a
subquery.

Oliver Kohll

Like this?:

SELECT extract(year from signup_date), count(email_address),
sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM
email_list GROUP BY 1 ORDER BY 1;

Thom

#5Oliver Kohll - Mailing Lists
oliver.lists@gtwm.co.uk
In reply to: Thom Brown (#4)
Re: Cumulative count (running total) window fn

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from signup_date) AS year, count(email_address) AS c
FROM email_list
GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like to know if it's possible to do with a window function rather than a subquery.

Oliver Kohll

Like this?:

SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM email_list GROUP BY 1 ORDER BY 1;

Thom

Almost, but put me on the right track! This one is exactly what I'm looking for:

SELECT extract(year from signup_date), count(email_address), sum(count(email_address)) OVER (ORDER BY extract(year from signup_date)) FROM email_list GROUP BY 1 ORDER BY 1;

The ORDER BY count(email_address) did give the same results for my data but only because the count values just happen to give the same ordering as the years - I tested by changing some dates.

Many thanks all.
Oliver

#6Magnus Hagander
magnus@hagander.net
In reply to: Oliver Kohll - Mailing Lists (#5)
Re: Cumulative count (running total) window fn

On Thu, Apr 29, 2010 at 13:43, Oliver Kohll - Mailing Lists
<oliver.lists@gtwm.co.uk> wrote:

Aren't you looking for something along the line of:

SELECT year, sum(c) over (order by year)
FROM (
 SELECT extract(year from signup_date) AS year, count(email_address) AS c
 FROM email_list
 GROUP BY extract(year from signup_date)
)

(adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like
to know if it's possible to do with a window function rather than a
subquery.
Oliver Kohll

Like this?:

SELECT extract(year from signup_date), count(email_address),
sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM
email_list GROUP BY 1 ORDER BY 1;

Thom

Almost, but put me on the right track! This one is exactly what I'm looking
for:
SELECT extract(year from signup_date), count(email_address),
sum(count(email_address)) OVER (ORDER BY extract(year from signup_date))
FROM email_list GROUP BY 1 ORDER BY 1;
The ORDER BY count(email_address) did give the same results for my data but
only because the count values just happen to give the same ordering as the
years - I tested by changing some dates.
Many thanks all.

Curious note - how does the non-subselect version and the subselect
version compare performance-wise?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#7Oliver Kohll - Mailing Lists
oliver.lists@gtwm.co.uk
In reply to: Magnus Hagander (#6)
Re: Cumulative count (running total) window fn

Curious note - how does the non-subselect version and the subselect
version compare performance-wise?

Magnus,

On a test table with 12,000 rows there's not much in it, the subselect has a simpler plan but they both take practically the same time.

The two plans (note I've been rewriting the field names for readability until now but haven't here):

explain analyze SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) AS c
FROM a2e9a7e9e257153de
GROUP BY extract(year from a56b7a8d6de03f67b)
) as subq;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=851.49..874.06 rows=1290 width=16) (actual time=43.369..43.394 rows=5 loops=1)
-> Sort (cost=851.49..854.71 rows=1290 width=16) (actual time=43.340..43.342 rows=5 loops=1)
Sort Key: (date_part('year'::text, a2e9a7e9e257153de.a56b7a8d6de03f67b))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=752.59..771.94 rows=1290 width=26) (actual time=43.300..43.317 rows=5 loops=1)
-> Seq Scan on a2e9a7e9e257153de (cost=0.00..689.56 rows=12605 width=26) (actual time=0.031..26.723 rows=12605 loops=1)
Total runtime: 43.549 ms

explain analyze SELECT extract(year from a56b7a8d6de03f67b), count(a10e4ab8863c199f1), sum(count(a10e4ab8863c199f1)) OVER (ORDER BY count(a10e4ab8863c199f1)) FROM a2e9a7e9e257153de GROUP BY 1 ORDER BY 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1382.39..1388.52 rows=2451 width=32) (actual time=44.229..44.230 rows=5 loops=1)
Sort Key: (date_part('year'::text, a56b7a8d6de03f67b))
Sort Method: quicksort Memory: 25kB
-> WindowAgg (cost=1195.39..1244.41 rows=2451 width=32) (actual time=44.171..44.208 rows=5 loops=1)
-> Sort (cost=1195.39..1201.52 rows=2451 width=32) (actual time=44.125..44.127 rows=5 loops=1)
Sort Key: (count(a10e4ab8863c199f1))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=1014.52..1057.41 rows=2451 width=32) (actual time=44.071..44.099 rows=5 loops=1)
-> Seq Scan on a2e9a7e9e257153de (cost=0.00..833.58 rows=24126 width=32) (actual time=0.032..26.683 rows=12605 loops=1)
Total runtime: 44.396 ms

Regards
Oliver Kohll

oliver@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company

#8Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Oliver Kohll - Mailing Lists (#7)
Re: Cumulative count (running total) window fn

On 29 Apr 2010, at 19:21, Oliver Kohll - Mailing Lists wrote:

The two plans (note I've been rewriting the field names for readability until now but haven't here):

explain analyze SELECT year, sum(c) over (order by year)
FROM (
SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) AS c
FROM a2e9a7e9e257153de
GROUP BY extract(year from a56b7a8d6de03f67b)
) as subq;

Oh my, how can you work with such column and table names? You and any colleagues you may have will probably appreciate having a few views over those tables that translate that gibberish to human readable stuff. You could go further and make those views updatable (by means of a few rules), but then you run the risk that colleagues start to hug you...

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.

!DSPAM:737,4bdaabce10411378620886!