how to use aggregate functions in this case

Started by Janek Sendrowskiover 12 years ago10 messagesgeneral

janek12@web.de

over 12 years ago

Hi,

Thats my code snipped:

SELECT v_rec1.user,
   sum(CASE WHEN v_rec_fts.lev BETWEEN 0 AND 25 THEN 1 ELSE 0 END) as "0 to 25",
   sum(CASE WHEN v_rec_fts.lev BETWEEN 25 AND 50 THEN 1 ELSE 0 END) as "25 to 50",
sum(CASE WHEN v_rec_fts.lev BETWEEN 50 AND 100 THEN 1 ELSE 0 END) as "50 to 100"
   INTO v_rec2
   GROUP BY user;

Now I want to summuarize the "0 to 25" values and the others in the same query.
Somehow like this: count("0 to 25")
But I want to do it with every single user and I don't know how to do that

A result should look like this:

user percentage count
smith "0 to 25" 5
smith "25 to 50" 7
smith "50 to 75" 2
jones "0 to 25" 11
jones "25 to 50" 1
jones "50 to 75" 3

Hope someone who can help me

Janek Sendrowski

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

over 12 years ago

In reply to: Janek Sendrowski (#1)

Re: how to use aggregate functions in this case

Janek Sendrowski wrote

Hi,

Thats my code snipped:

SELECT v_rec1.user,
   sum(CASE WHEN v_rec_fts.lev BETWEEN 0 AND 25 THEN 1 ELSE 0 END) as
"0 to 25",
   sum(CASE WHEN v_rec_fts.lev BETWEEN 25 AND 50 THEN 1 ELSE 0 END)
as "25 to 50",
sum(CASE WHEN v_rec_fts.lev BETWEEN 50 AND 100 THEN 1 ELSE 0 END)
as "50 to 100"
   INTO v_rec2
   GROUP BY user;

Now I want to summuarize the "0 to 25" values and the others in the same
query.
Somehow like this: count("0 to 25")
But I want to do it with every single user and I don't know how to do that

A result should look like this:

user percentage count
smith "0 to 25" 5
smith "25 to 50" 7
smith "50 to 75" 2
jones "0 to 25" 11
jones "25 to 50" 1
jones "50 to 75" 3

Hope someone who can help me

Janek Sendrowski

--
Sent via pgsql-general mailing list (

pgsql-general@

)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

You have to write three queries (select statements) and then "UNION ALL"
them together. Each sub-query has a where clause matching the range you
wish to aggregate for.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768523.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Janek Sendrowski

janek12@web.de

over 12 years ago

In reply to: David G. Johnston (#2)

Re: how to use aggregate functions in this case

<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>
<div>Sorry, I formulated it wrong.</div>

<div>My problem is, that I want to count the ranges for every user, but if I use count(range), it counts the ranges of all users.</div>

<div>Janek Sendrowski</div>
</div></div></body></html>

David G. Johnston

david.g.johnston@gmail.com

over 12 years ago

In reply to: Janek Sendrowski (#3)

Re: how to use aggregate functions in this case

Janek Sendrowski wrote

Sorry, I formulated it wrong.

My problem is, that I want to count the ranges for every user, but if I
use count(range), it counts the ranges of all users.

Assuming your example output is indeed what you desire:

SELECT user, '0 to 25'::varchar AS percentage, count(*) FROM tbl WHERE val
BETWEEN 0 AND 25 GROUP BY user

UNION ALL

SELECT user, '25 to 50'::varchar, count(*) FROM tbl WHERE val BETWEEN 25 AND
50 GROUP BY user

...and so forth

Note that BETWEEN has inclusive end-points so anything with exactly 25 (for
example) is being double-counted.

Your other option is something like:

SELECT user, percentage_range_category, count(*) FROM (
SELECT recordid, user, CASE .... END::varchar AS
percentage_range_category
) categorize
GROUP BY user, percentage_range_category;

In short for every record you simply categorize the value then add that
category to your group-by.

Both are equally valid and the second one is probably easier to comprehend;
the first option just happened to occur to me first. I have no idea which
one would perform better in theory nor specifically with your data.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768525.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

BladeOfLight16

bladeoflight16@gmail.com

over 12 years ago

In reply to: Janek Sendrowski (#1)

Re: how to use aggregate functions in this case

On Sun, Aug 25, 2013 at 5:59 PM, Janek Sendrowski <janek12@web.de> wrote:

SELECT v_rec1.user,
sum(CASE WHEN v_rec_fts.lev BETWEEN 0 AND 25 THEN 1 ELSE 0 END) as
"0 to 25",
sum(CASE WHEN v_rec_fts.lev BETWEEN 25 AND 50 THEN 1 ELSE 0 END)
as "25 to 50",
sum(CASE WHEN v_rec_fts.lev BETWEEN 50 AND 100 THEN 1 ELSE 0 END)
as "50 to 100"
INTO v_rec2
GROUP BY user;

Now I want to summuarize the "0 to 25" values and the others in the same
query.
Somehow like this: count("0 to 25")
But I want to do it with every single user and I don't know how to do that

A result should look like this:

user percentage count
smith "0 to 25" 5
smith "25 to 50" 7
smith "50 to 75" 2
jones "0 to 25" 11
jones "25 to 50" 1
jones "50 to 75" 3

This appears to be some kind of equal interval problem.

SELECT v_rec1.user,
WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4) AS bucket
COUNT(*) as count,
FROM v_rec2
GROUP BY user, bucket;

(Untested, but this should be the gist.)

Bucket 1 would be 0 to 25, bucket 2 is 25 to 50, 3 is 50 to 75, 4 is 75 to
100. If you really need to change the bucket number to some kind of text,
you can probably nest this query inside another that uses a CASE to pick
the text based on on the bucket number.

Good luck.

BladeOfLight16

bladeoflight16@gmail.com

over 12 years ago

In reply to: BladeOfLight16 (#5)

Re: how to use aggregate functions in this case

On Sun, Aug 25, 2013 at 8:36 PM, BladeOfLight16 <bladeoflight16@gmail.com>wrote:

This appears to be some kind of equal interval problem.

SELECT v_rec1.user,
WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4) AS bucket
COUNT(*) as count,
FROM v_rec2
GROUP BY user, bucket;

(Untested, but this should be the gist.)

Bucket 1 would be 0 to 25, bucket 2 is 25 to 50, 3 is 50 to 75, 4 is 75 to
100. If you really need to change the bucket number to some kind of text,
you can probably nest this query inside another that uses a CASE to pick
the text based on on the bucket number.

Good luck.

Then again, I guess you don't need a nested query.

SELECT v_rec1.user,
CASE WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4)
WHEN 1 THEN '0 to 25'
WHEN 2 THEN '25 to 50'
WHEN 3 THEN '50 to 75'
WHEN 4 THEN '75 to 100'
ELSE 'But how?'
END CASE AS quarter_percentage
COUNT(*) as count,
FROM v_rec2
GROUP BY user, quarter_percentage;

David G. Johnston

david.g.johnston@gmail.com

over 12 years ago

In reply to: BladeOfLight16 (#6)

Re: how to use aggregate functions in this case

BladeOfLight16 wrote

Then again, I guess you don't need a nested query.

SELECT v_rec1.user,
CASE WIDTH_BUCKET(v_rec_fts.lev, 0, 100, 4)
WHEN 1 THEN '0 to 25'
WHEN 2 THEN '25 to 50'
WHEN 3 THEN '50 to 75'
WHEN 4 THEN '75 to 100'
ELSE 'But how?'
END CASE AS quarter_percentage
COUNT(*) as count,
FROM v_rec2
GROUP BY user, quarter_percentage;

This is clean but requires the use of equal intervals.

Another option, though I am unfamiliar with the exact syntax, is to use the
contains operator and an "intrange" (range type, for integer or whatever
actual type is needed).

SELECT CASE WHEN val @> '(0, 25]' -- '(25,50]', etc...

This allows for uneven bucket sizes and avoid the double-inclusive endpoints
problem that results from using BETWEEN.

Requires 9.2

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768573.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Janek Sendrowski

janek12@web.de

over 12 years ago

In reply to: David G. Johnston (#7)

Re: how to use aggregate functions in this case

<div>thanks for all your answers.</div>

<div>I'll have a try with the contains operator and the intrange, but before I'd like to know if this would work:</div>

<div>There wouldn't be a double endpoint. I just have to decide which range the endpoint includes.</div>

<div>Janek Sendrowski</div>
</div></div></body></html>

David G. Johnston

david.g.johnston@gmail.com

over 12 years ago

In reply to: Janek Sendrowski (#8)

Re: how to use aggregate functions in this case

Janek Sendrowski wrote

Hi,

 

thanks for all your answers.

I'll have a try with the contains operator and the intrange, but
before I'd like to know if this would work:

 

CASE WHEN a >= 0 AND a < 25

CASE WHEN a >= 25 AND a < 50

 

There wouldn't be a double endpoint. I just have to decide which range
the endpoint includes.

 

Janek Sendrowski

Yes, using explicit comparison operators with "AND" will work just fine; its
just a little more verbose so the other options, if available, are preferred
from a readability standpoint. I do not know whether the different options
may have different performance characteristics.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768636.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

David G. Johnston

david.g.johnston@gmail.com

over 12 years ago

In reply to: David G. Johnston (#9)

Re: how to use aggregate functions in this case

David Johnston wrote

Janek Sendrowski wrote

Hi,

 

thanks for all your answers.

I'll have a try with the contains operator and the intrange, but
before I'd like to know if this would work:

 

CASE WHEN a >= 0 AND a < 25

CASE WHEN a >= 25 AND a < 50

 

There wouldn't be a double endpoint. I just have to decide which
range the endpoint includes.

 

Janek Sendrowski

Yes, using explicit comparison operators with "AND" will work just fine;
its just a little more verbose so the other options, if available, are
preferred from a readability standpoint. I do not know whether the
different options may have different performance characteristics.

David J.

And just for completeness:

WITH range_def (low, high, label) AS (
VALUES (0, 25, '0 to 25'::varchar), (25, 50, '25 to 50')
)
SELECT ...
FROM source_data
JOIN range_def ON (val >= low AND val < high)
[the rest of the query]

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/how-to-use-aggregate-functions-in-this-case-tp5768522p5768721.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general