BUG #12228: Primary use-case of PERCENT_RANK not supported

Started by Jonathon Lachlan-Hachéover 11 years ago3 messagesbugs
Jump to latest
#1Jonathon Lachlan-Haché
jonlachlan@gmail.com

The following bug has been logged on the website:

Bug reference: 12228
Logged by: Jonathon Lachlan-Hache
Email address: jonlachlan@gmail.com
PostgreSQL version: 9.4beta2
Operating system: MacOSX
Description:

I'm using the 9.4 RC1 and trying out the WITHIN GROUP functions. I used a
percentile_disc pattern successfully, however I think I came across a
problem with using percent_rank(). Here is my SQL:

SELECT percent_rank(datavalue) WITHIN GROUP (ORDER BY datavalue) as
pct_rank, measureid
FROM measuredata
WHERE surveyyear=2013
GROUP BY measureid;

I want to be able to run a query that programmatically displays the
'pct_rank' of datavalue. My table is organized into four columns:
organizationid
measureid
surveyyear
datavalue

The 'datavalue' column is the measure, whereas all the other columns are
attributes. When I run percent_rank() it needs to determine what percentile
is the 'datavalue', but within the scope of the same 'measureid' and
'surveyyear'.

The percent_rank() function does not appear to support this. The above query
returns the following:

ERROR: column "measuredata.datavalue" must appear in the GROUP BY clause or
be used in an aggregate function
LINE 1: SELECT percent_rank(datavalue) WITHIN GROUP (ORDER BY datava...
^
DETAIL: Direct arguments of an ordered-set aggregate must use only grouped
columns.

********** Error **********

ERROR: column "measuredata.datavalue" must appear in the GROUP BY clause or
be used in an aggregate function
SQL state: 42803
Detail: Direct arguments of an ordered-set aggregate must use only grouped
columns.
Character: 21

This doesn't make sense, because if I include 'datavalue' as a GROUP BY
clause, then my group N-size is exactly 1.

Basically, because the argument of percent_rank() must also be present in
the GROUP BY clause, there is no way to use percent_rank() to
programatically determine the percent-rank of a value within a set. And this
is likely to be the primary use-case of any kind of percent-rank function,
so it would make sense to include it in the new implementation.

Thanks so much,
Jon

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathon Lachlan-Haché (#1)
Re: BUG #12228: Primary use-case of PERCENT_RANK not supported

jonlachlan@gmail.com writes:

I'm using the 9.4 RC1 and trying out the WITHIN GROUP functions. I used a
percentile_disc pattern successfully, however I think I came across a
problem with using percent_rank(). Here is my SQL:

SELECT percent_rank(datavalue) WITHIN GROUP (ORDER BY datavalue) as
pct_rank, measureid
FROM measuredata
WHERE surveyyear=2013
GROUP BY measureid;

That query isn't very sensible: the direct argument of percent_rank() has
to be a constant over any one aggregation group, else the percentile
calculation is meaningless.

I want to be able to run a query that programmatically displays the
'pct_rank' of datavalue.

I'm not 100% sure what you mean by that, but I suspect you are looking for
something closer to the basic percent_rank() window function, not the
hypothetical-set function. Hypothetical-set functions are for computing
the measure that would be attributed to a row that's not actually present
in the data. Moreover, since they're aggregates, they produce only one
output per GROUP BY group, and I don't understand what you mean by
"pct_rank of datavalue" at a group level. Perhaps what you want is

SELECT percent_rank() OVER (PARTITION BY measureid ORDER BY datavalue) as
pct_rank, measureid
FROM measuredata
WHERE surveyyear=2013;

It could also be that what you're after will require doing a window
function like that in a sub-SELECT, and then grouping and/or aggregating
in the outer query.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Jonathon Lachlan-Haché
jonlachlan@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #12228: Primary use-case of PERCENT_RANK not supported

Yes, you are right that I just needed the basic window function. I must
have been confused because I was using percentile_disc() and assumed I
would be using the same WITHIN GROUP syntax.

I was not clear before about the hypothetical set functions and their
purpose. I have looked at the documentation at
http://www.postgresql.org/docs/9.4/static/functions-aggregate.html and now
see the difference.

Might it be worthwhile to include a suggestion in the error message and/or
documentation "Use of percent_rank() WITHIN GROUP is a hypothetical-set
aggregate function. If you do not intend to use a hypothetical-set
aggregate function, try using a window function with PARTITION BY instead
of WITHIN GROUP"? Not sure if that aligns with your style guidelines, but
just a thought.

Thanks so much for your previous response,
Jon