Re: WITHIN GROUP patch
2013/10/9 Pavel Stehule <pavel.stehule@gmail.com>
Show quoted text
Hello
I checked a conformance with ANSI SQL - and I didn't find any issue.
I found so following error message is not too friendly (mainly because
this functionality will be new)postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
^
postgres=# select dense_rank(3,3,2) within group (order by num desc) from
test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc) fr...
^
postgres=# select dense_rank(3,3) within group (order by num desc) from
test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3) within group (order by num desc) from...
^
postgres=# select dense_rank(3,3) within group (order by num desc, num)
from test4;
dense_rank
------------
3
(1 row)Probably some hint should be there?
Regards
Pavel
2013/10/2 Vik Fearing <vik.fearing@dalibo.com>
On 09/30/2013 06:34 PM, Pavel Stehule wrote:
I looked on this patch - it is one from long patches - so I propose to
divide review to a few parts:a) a conformance with ANSI SQL
b) check of new aggregates - semantic, implementation
c) source code checking - usual patch reviewNow I would to work on @a
I had an unexpected emergency come up, sorry about that. I plan on
doing B and C starting on Thursday (October 3).I am grateful to have Pavel's help, this is a big patch.
--
Vik
Import Notes
Reply to msg id not found: CAFj8pRB1dQ-jAxDpy+Me6UBUYUFKh_4wr-jhxVdO5h847Azw@mail.gmail.comReference msg id not found: CAESHdJrh3Eh-2PcR_KzLMU5h-JUZxoYjf-mGX5M7StArT-Jg@mail.gmail.comReference msg id not found: CAFj8pRD63ybsbBeeHU0iKLj+a3fSnUWqLUywZz_jTqLAay9wWg@mail.gmail.comReference msg id not found: 524BE81F.3030601@dalibo.comReference msg id not found: CAFj8pRB1dQ-jAxDpy+Me6UBUYUFKh_4wr-jhxVdO5h847Azw@mail.gmail.com
"Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:
I found so following error message is not too friendly (mainly because
this functionality will be new)postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
^Probably some hint should be there?
Hm, yeah.
Anyone have ideas for better wording for the original error message,
or a DETAIL or HINT addition? The key point here is that the number of
_hypothetical_ arguments and the number of ordered columns must match,
but we don't currently exclude the possibility that a user-defined
hypothetical set function might have additional non-hypothetical args.
The first alternative that springs to mind is:
ERROR: Incorrect number of arguments for hypothetical set function
DETAIL: Number of hypothetical arguments (3) must equal number of ordered columns (2)
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/10/11 Andrew Gierth <andrew@tao11.riddles.org.uk>
"Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:
I found so following error message is not too friendly (mainly because
this functionality will be new)postgres=# select dense_rank(3,3,2) within group (order by num desc,
odd)
from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
^Probably some hint should be there?
Hm, yeah.
Anyone have ideas for better wording for the original error message,
or a DETAIL or HINT addition? The key point here is that the number of
_hypothetical_ arguments and the number of ordered columns must match,
but we don't currently exclude the possibility that a user-defined
hypothetical set function might have additional non-hypothetical args.The first alternative that springs to mind is:
ERROR: Incorrect number of arguments for hypothetical set function
DETAIL: Number of hypothetical arguments (3) must equal number of ordered
columns (2)+1
Pavel
Show quoted text
--
Andrew (irc:RhodiumToad)
On Thu, Oct 10, 2013 at 10:35 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
The first alternative that springs to mind is:
ERROR: Incorrect number of arguments for hypothetical set function
DETAIL: Number of hypothetical arguments (3) must equal number of ordered columns (2)
I'd suggest trying to collapse that down into a single message; the
DETAIL largely recapitulates the original error message. Also, I'd
suggest identifying the name of the function, since people may not be
able to identify that easily based just on the fact that it's a
hypothetical set function.
Here's one idea, with two variants depending on the direction of the inequality:
ERROR: function "%s" has %d arguments but only %d ordering columns
ERROR: function "%s" has %d ordering columns but only %d arguments
Or else leave out the actual numbers and just state the principle, but
identifying the exact function at fault, e.g.
ERROR: number of arguments to function "%s" does not match number of
ordering columns
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/09/2013 04:19 PM, Pavel Stehule wrote:
I checked a conformance with ANSI SQL - and I didn't find any issue.
I found so following error message is not too friendly (mainly
because this functionality will be new)postgres=# select dense_rank(3,3,2) within group (order by num
desc, odd) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc,
od...
^
postgres=# select dense_rank(3,3,2) within group (order by num
desc) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc)
fr...
^
postgres=# select dense_rank(3,3) within group (order by num desc)
from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3) within group (order by num desc)
from...
^
postgres=# select dense_rank(3,3) within group (order by num desc,
num) from test4;
dense_rank
------------
3
(1 row)Probably some hint should be there?
In addition to Pavel's review, I have finally finished reading the
patch. Here are some notes, mainly on style:
First of all, it no longer compiles on HEAD because commit
4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968. I modified
that locally to be able to continue my review.
Some of the error messages do not comply with project style. That is,
they begin with a capital letter.
Ordered set functions cannot have transition functions
Ordered set functions must have final functions
Invalid argument types for hypothetical set function
Invalid argument types for ordered set function
Incompatible change to aggregate definition
Too many arguments to ordered set function
Ordered set finalfns must not be strict
Cannot have multiple ORDER BY clauses with WITHIN GROUP
Cannot have DISTINCT and WITHIN GROUP together
Incorrect number of arguments for hypothetical set function
Incorrect number of direct arguments to ordered set function %s
And in pg_aggregate.c I found a comment with a similar problem that
doesn't match its surrounding code:
Oid transsortop = InvalidOid; /* Can be omitted */
I didn't find any more examples like that, but I did see several block
comments that weren't complete sentences whereas I think they should
be. Also a lot of the code comments say "I" and I don't recall seeing
that elsewhere. I may be wrong, but I would prefer if they were more
neutral.
The documentation has a number of issues.
collateindex.pl complains of duplicated index entries for PERCENTILE
CONTINUOUS and PERCENTILE DISCRETE. This is because the index markup is
used for both overloaded versions. This is the same mistake Bruce made
and then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3.
"if there are multiple equally good result" should have an s on the end
in func.sgml.
Table 9-49 has an extra empty column. That should either be removed, or
filled in with some kind of comment text like other similar tables.
Apart from that, it looks good. There is some mismatched coding styles
in there but the next run of pgindent should catch them so it's no big deal.
I haven't yet exercised the actual functionality of the new functions,
nor have I tried to create my own. Andrew alerted me to a division by
zero bug in one of them, so I'll be looking forward to catching that.
So, more review to come.
--
Vik
On Tue, Oct 15, 2013 at 4:29 PM, Vik Fearing <vik.fearing@dalibo.com> wrote:
On 10/09/2013 04:19 PM, Pavel Stehule wrote:
I checked a conformance with ANSI SQL - and I didn't find any issue.
I found so following error message is not too friendly (mainly because
this functionality will be new)postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
^
postgres=# select dense_rank(3,3,2) within group (order by num desc) from
test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc) fr...
^
postgres=# select dense_rank(3,3) within group (order by num desc) from
test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3) within group (order by num desc) from...
^
postgres=# select dense_rank(3,3) within group (order by num desc, num)
from test4;
dense_rank
------------
3
(1 row)Probably some hint should be there?
In addition to Pavel's review, I have finally finished reading the patch.
Here are some notes, mainly on style:First of all, it no longer compiles on HEAD because commit
4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968. I modified that
locally to be able to continue my review.Some of the error messages do not comply with project style. That is, they
begin with a capital letter.Ordered set functions cannot have transition functions
Ordered set functions must have final functions
Invalid argument types for hypothetical set function
Invalid argument types for ordered set function
Incompatible change to aggregate definition
Too many arguments to ordered set function
Ordered set finalfns must not be strict
Cannot have multiple ORDER BY clauses with WITHIN GROUP
Cannot have DISTINCT and WITHIN GROUP togetherIncorrect number of arguments for hypothetical set function
Incorrect number of direct arguments to ordered set function %sAnd in pg_aggregate.c I found a comment with a similar problem that doesn't
match its surrounding code:
Oid transsortop = InvalidOid; /* Can be omitted */I didn't find any more examples like that, but I did see several block
comments that weren't complete sentences whereas I think they should be.
Also a lot of the code comments say "I" and I don't recall seeing that
elsewhere. I may be wrong, but I would prefer if they were more neutral.The documentation has a number of issues.
collateindex.pl complains of duplicated index entries for PERCENTILE
CONTINUOUS and PERCENTILE DISCRETE. This is because the index markup is
used for both overloaded versions. This is the same mistake Bruce made and
then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3."if there are multiple equally good result" should have an s on the end in
func.sgml.Table 9-49 has an extra empty column. That should either be removed, or
filled in with some kind of comment text like other similar tables.Apart from that, it looks good. There is some mismatched coding styles in
there but the next run of pgindent should catch them so it's no big deal.I haven't yet exercised the actual functionality of the new functions, nor
have I tried to create my own. Andrew alerted me to a division by zero bug
in one of them, so I'll be looking forward to catching that.So, more review to come.
--
Vik
Hi All,
Please find attached our latest version of the patch. This version
fixes the issues pointed out by the reviewers.
Regards,
Atri
--
Regards,
Atri
l'apprenant
Attachments:
withingrouppatch41113.patchtext/x-diff; charset=US-ASCII; name=withingrouppatch41113.patchDownload+3950-641
On 11/04/2013 08:43 AM, Atri Sharma wrote:
Please find attached our latest version of the patch. This version
fixes the issues pointed out by the reviewers.
No, it doesn't. The documentation still contains formatting and
grammatical errors, and the code comments still do not match the their
surroundings. (The use of "I" in the code comments is a point I have
conceded on IRC, but I stand by my other remarks.)
Don't bother submitting a new patch until I've posted my technical
review, but please fix these issues on your local copy.
--
Vik
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi all,
Please find the latest version of the patch. This version fixes the
issues pointed out by the reviewer and the divide by zero bug in
percent_rank function. This version also adds a regression test for
the divide by zero case in percent_rank.
Regards,
Atri
Attachments:
withingroup14112013.patchtext/x-diff; charset=US-ASCII; name=withingroup14112013.patchDownload+3958-643
On Fri, 2013-11-15 at 00:05 +0530, Atri Sharma wrote:
Please find the latest version of the patch. This version fixes the
issues pointed out by the reviewer and the divide by zero bug in
percent_rank function. This version also adds a regression test for
the divide by zero case in percent_rank.
This patch doesn't apply.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Nov 18, 2013 at 9:26 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
On Fri, 2013-11-15 at 00:05 +0530, Atri Sharma wrote:
Please find the latest version of the patch. This version fixes the
issues pointed out by the reviewer and the divide by zero bug in
percent_rank function. This version also adds a regression test for
the divide by zero case in percent_rank.This patch doesn't apply.
Hi all,
Please find attached the latest patch for WITHIN GROUP. This patch is
after fixing the merge conflicts.
Regards,
Atri
--
Regards,
Atri
l'apprenant
Attachments:
withingrouppatch211113context.patchtext/x-diff; charset=US-ASCII; name=withingrouppatch211113context.patchDownload+4534-1706
On 11/21/2013 11:04 AM, Atri Sharma wrote:
Please find attached the latest patch for WITHIN GROUP. This patch is
after fixing the merge conflicts.
I have spent quite some time on this and the previous versions. Here is
my review, following the questions on the wiki.
This patch is in context diff format and applies cleanly to today's
master. It contains several regression tests and for the most part,
good documentation. I would like to see at least one example of using
each of the two types of function (hypothetical and inverted
distribution) in section 4.2.8.
This patch implements what it says it does. We don't already have a way
to get these results without this patch that I know of, and I think we
do want it. I certainly want it. I do not have a copy of the SQL
standard, but I have full faith in the Andrew Gierth's claim that this
is part of it. Even if not, implementation details were brought up
during design and agreed upon by this list[1]/messages/by-id/2b8b55b8ba82f83ef4e6070b95fb0cd0@news-out.riddles.org.uk. I don't see how anything
here could be dangerous. The custom ordered set functions I made
correctly passed a round-trip through dump/restore.
The code compiles without warning. All of the clean tests I did worked
as expected, and all of the dirty tests failed elegantly.
I did not find any corner cases, and I looked in as many corners as I
could think of. I didn't manage to trigger any assertion failures and I
didn't crash it.
I found no noticeable issues with performance, either directly or as
side effects.
I am not the most competent with code review so I'll be relying on
further review by another reviewer or the final committer. The patch
fixed the project comments/messages style issues I raised in my previous
review. I found the code comments lacking in some places (like
inversedistribution.c:mode_final for example) but I can't say if the
really is too terse, or if it's just me. On the other hand, I thought
the explanation blocks in the code comments were adequately descriptive.
There is some room for improvement in future versions. The query select
mode() within group (order by x) over (partition by y) from ... should
be valid but isn't. This was explained by Andrew on IRC as being
non-trivial: "specifically, we implemented WITHIN GROUP by repurposing
the infrastructure already present for agg(distinct ...) and agg(x order
by y) which are also not yet supported for
aggregate-as-window-function". I assume then that evolution on one side
will benefit the other.
All in all, I believe this is ready for committer.
[1]: /messages/by-id/2b8b55b8ba82f83ef4e6070b95fb0cd0@news-out.riddles.org.uk
/messages/by-id/2b8b55b8ba82f83ef4e6070b95fb0cd0@news-out.riddles.org.uk
--
Vik
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Vik" == Vik Fearing <vik.fearing@dalibo.com> writes:
Vik> I certainly want it. I do not have a copy of the SQL standard,
Vik> but I have full faith in the Andrew Gierth's claim that this is
Vik> part of it.
For reference, this is how I believe it matches up against the spec
(I'm working from the 2008 final):
10.9 <aggregate function>:
<hypothetical set function> is intended to be implemented in this
patch exactly as per spec.
<inverse distribution function>: the spec defines two of these,
PERCENTILE_CONT and PERCENTILE_DISC:
PERCENTILE_CONT is defined in the spec for numeric types, in which
case it returns an approximate numeric result, and for interval, in
which case it returns interval. Our patch defines percentile_cont
functions for float8 and interval input types, relying on implicit
casting to float8 to handle other numeric input types.
As an extension to the spec, we define a percentile_cont function
that returns an array of percentile values in one call, as suggested
by some users.
PERCENTILE_DISC is defined in the spec for the same types as _CONT.
Our version on the other hand accepts any type which can be sorted,
and returns the same type. This does mean that our version may return
an exact numeric type rather than an approximate one, so this is a
possible slight deviation from the spec.
i.e. our percentile_disc(float8) within group (order by bigint)
returns a bigint, while the spec seems to imply it should return an
approximate numeric type (i.e. float*).
Again, we additionally provide an array version which is not in the
spec.
mode() is not in the spec, we just threw it in because it was easy.
6.10 <window function>
The spec says that <hypothetical set function> is not allowed as a
window function.
The spec does not forbid other <ordered set function>s in a window
function call, but we have NOT attempted to implement this (largely
for the same reasons that DISTINCT and ORDER BY are not implemented
for aggregates as window functions).
Conformance: all the relevant features are parts of T612, "Advanced
OLAP Operations", which we already list in the docs on the unsupported
list with the comment "some forms supported". Maybe that could be
changed now to "most forms supported", but that's a subjective call
(and one we didn't really consider doing in this patch).
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Atri Sharma <atri.jiit@gmail.com> writes:
Please find attached the latest patch for WITHIN GROUP. This patch is
after fixing the merge conflicts.
I've started to look at this patch now. I have a couple of immediate
reactions to the catalog changes:
1. I really hate the way you've overloaded the transvalue to do something
that has approximately nothing to do with transition state (and haven't
updated catalogs.sgml to explain that, either). Seems like it'd be
cleaner to just hardwire a bool column that distinguishes regular and
hypothetical input rows. And why do you need aggtranssortop for that?
I fail to see the point of sorting on the flag column.
2 I also don't care for the definition of aggordnargs, which is the number
of direct arguments to an ordered set function, except when it isn't.
Rather than overloading it to be both a count and a couple of flags,
I wonder whether we shouldn't expand aggisordsetfunc to be a three-way
"aggregate kind" field (ordinary agg, ordered set, or hypothetical set
function).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> 1. I really hate the way you've overloaded the transvalue to do
Tom> something that has approximately nothing to do with transition
Tom> state (and haven't updated catalogs.sgml to explain that,
Tom> either). Seems like it'd be cleaner to just hardwire a bool
Tom> column that distinguishes regular and hypothetical input rows.
The intention here is that while the provided functions all fit the
spec's idea of how inverse distribution or hypothetical set functions
work, the actual implementation mechanisms are more generally
applicable than that and a user-supplied function could well find
something else useful to do with them. Accordingly, hardcoding stuff
seemed inappropriate.
Tom> And why do you need aggtranssortop for that? I fail to see the
Tom> point of sorting on the flag column.
It is convenient to the implementation to be able to rely on
encountering the hypothetical row deterministically before (or in some
cases after, as in cume_dist) its peers in the remainder of the sort
order. Removing that sort would make the results of the functions
incorrect.
There should probably be some comments about that. oops.
Tom> 2 I also don't care for the definition of aggordnargs, which is
Tom> the number of direct arguments to an ordered set function,
Tom> except when it isn't. Rather than overloading it to be both a
Tom> count and a couple of flags, I wonder whether we shouldn't
Tom> expand aggisordsetfunc to be a three-way "aggregate kind" field
Tom> (ordinary agg, ordered set, or hypothetical set function).
It would still be overloaded in some sense because a non-hypothetical
ordered set function could still take an arbitrary number of args
(using variadic "any") - there aren't any provided, but there's no
good reason to disallow user-defined functions doing that - so you'd
still need a special value like -1 for aggordnargs to handle that.
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> 1. I really hate the way you've overloaded the transvalue to do
Tom> something that has approximately nothing to do with transition
Tom> state (and haven't updated catalogs.sgml to explain that,
Tom> either). Seems like it'd be cleaner to just hardwire a bool
Tom> column that distinguishes regular and hypothetical input rows.
The intention here is that while the provided functions all fit the
spec's idea of how inverse distribution or hypothetical set functions
work, the actual implementation mechanisms are more generally
applicable than that and a user-supplied function could well find
something else useful to do with them. Accordingly, hardcoding stuff
seemed inappropriate.
Tom> And why do you need aggtranssortop for that? I fail to see the
Tom> point of sorting on the flag column.
It is convenient to the implementation to be able to rely on
encountering the hypothetical row deterministically before (or in some
cases after, as in cume_dist) its peers in the remainder of the sort
order. Removing that sort would make the results of the functions
incorrect.
Well, okay, but you've not said anything that wouldn't be handled just as
well by some logic that adds a fixed integer-constant-zero flag column to
the rows going into the tuplesort. The function-specific code could then
inject hypothetical row(s) with other values, eg 1 or -1, to get the
results you describe. If there's any flexibility improvement from
allowing a different constant value than zero, or a different datatype
than integer, I don't see what it'd be.
On the other side of the coin, repurposing the transition-value catalog
infrastructure to do this totally different thing is confusing. And what
if someday somebody has a use for a regular transition value along with
this stuff? What you've done is unorthogonal for no very good reason.
(Actually, I'm wondering whether it wouldn't be better if the tuplesort
object *were* the transition value, and were managed primarily by the
aggregate function's code; in particular expecting the agg's transition
function to insert rows into the tuplesort object. We could provide
helper functions to largely negate any duplication-of-code objections,
I'd think. In this view the WITHIN GROUP columns aren't so different from
regular aggregate arguments, but there'd need to be some way for the
aggregate function to get hold of the sorting-semantics decoration on
them.)
Tom> 2 I also don't care for the definition of aggordnargs, which is
Tom> the number of direct arguments to an ordered set function,
Tom> except when it isn't. Rather than overloading it to be both a
Tom> count and a couple of flags, I wonder whether we shouldn't
Tom> expand aggisordsetfunc to be a three-way "aggregate kind" field
Tom> (ordinary agg, ordered set, or hypothetical set function).
It would still be overloaded in some sense because a non-hypothetical
ordered set function could still take an arbitrary number of args
(using variadic "any") - there aren't any provided, but there's no
good reason to disallow user-defined functions doing that - so you'd
still need a special value like -1 for aggordnargs to handle that.
Sure. But a -1 to indicate "not applicable" doesn't seem like it's
too much of a stretch. It's the -2 business that's bothering me.
Again, that seems unnecessarily non-orthogonal --- who's to say which
functions would want to constrain the number of direct arguments and
which wouldn't? (I wonder whether having this info in the catalogs
isn't the wrong thing anyhow, as opposed to expecting the functions
themselves to check the argument count at runtime.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Well, okay, but you've not said anything that wouldn't be
Tom> handled just as well by some logic that adds a fixed
Tom> integer-constant-zero flag column to the rows going into the
Tom> tuplesort.
Adding such a column unconditionally even for non-hypothetical
functions would break the optimization for sorting a single column
(which is a big deal, something like 3x speed difference, for by-value
types).
Adding it only for hypothetical set functions is making a distinction
in how functions are executed that I don't think is warranted -
imagine for example a function that calculates some measure over a
frequency distribution by adding a known set of boundary values to the
sort; this would not be a hypothetical set function in terms of
argument processing, but it would still benefit from the extra sort
column. I did not want to unnecessarily restrict such possibilities.
It would still be overloaded in some sense because a non-hypothetical
ordered set function could still take an arbitrary number of args
(using variadic "any") - there aren't any provided, but there's no
good reason to disallow user-defined functions doing that - so you'd
still need a special value like -1 for aggordnargs to handle that.
Tom> Sure. But a -1 to indicate "not applicable" doesn't seem like it's
Tom> too much of a stretch. It's the -2 business that's bothering me.
Tom> Again, that seems unnecessarily non-orthogonal --- who's to say which
Tom> functions would want to constrain the number of direct arguments and
Tom> which wouldn't? (I wonder whether having this info in the catalogs
Tom> isn't the wrong thing anyhow, as opposed to expecting the functions
Tom> themselves to check the argument count at runtime.)
Not checking the number of arguments to a function until runtime seems
a bit on the perverse side. Having a fixed number of direct args is
the "normal" case (as seen from the fact that all the non-hypothetical
ordered set functions in the spec and in our patch have fixed argument
counts).
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Well, okay, but you've not said anything that wouldn't be
Tom> handled just as well by some logic that adds a fixed
Tom> integer-constant-zero flag column to the rows going into the
Tom> tuplesort.
Adding such a column unconditionally even for non-hypothetical
functions would break the optimization for sorting a single column
(which is a big deal, something like 3x speed difference, for by-value
types).
Well, sure, but I was only suggesting adding it when the aggregate asks
for it, probably via a new flag column in pg_aggregate. The question
you're evading is what additional functionality could be had if the
aggregate could demand a different datatype or constant value for the
flag column.
Adding it only for hypothetical set functions is making a distinction
in how functions are executed that I don't think is warranted -
That seems like rather a curious argument from someone who's willing to
give up the ability to specify a regular transition value concurrently
with the flag column.
But anyway, what I'm thinking right now is that these questions would all
go away if the aggregate transfunction were receiving the rows and
sticking them into the tuplestore. It could add whatever columns it felt
like.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Well, sure, but I was only suggesting adding it when the
Tom> aggregate asks for it, probably via a new flag column in
Tom> pg_aggregate.
Sure, I was only pointing out the necessity.
Tom> The question you're evading is what additional functionality
Tom> could be had if the aggregate could demand a different datatype
Tom> or constant value for the flag column.
I don't really see a question there to answer - I simply chose to
provide a general mechanism rather than make assumptions about what
future users of the code would desire. I have no specific application
in mind that would require some other type.
Adding it only for hypothetical set functions is making a
distinction in how functions are executed that I don't think is
warranted -
Tom> That seems like rather a curious argument from someone who's
Tom> willing to give up the ability to specify a regular transition
Tom> value concurrently with the flag column.
In the current patch the idea of also specifying a regular transition
value is meaningless since there is no transition function.
Tom> But anyway, what I'm thinking right now is that these questions
Tom> would all go away if the aggregate transfunction were receiving
Tom> the rows and sticking them into the tuplestore. It could add
Tom> whatever columns it felt like.
True, but this ends up duplicating the sorting functionality of
nodeAgg that we are leveraging off in the first place. I think this
will be somewhat more intrusive and likely slower.
--
Andrew (irc:RhodiumToad)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> But anyway, what I'm thinking right now is that these questions
Tom> would all go away if the aggregate transfunction were receiving
Tom> the rows and sticking them into the tuplestore. It could add
Tom> whatever columns it felt like.
True, but this ends up duplicating the sorting functionality of
nodeAgg that we are leveraging off in the first place. I think this
will be somewhat more intrusive and likely slower.
Hm, it's just a refactoring of the same code we'd have to have anyway,
so I'm not seeing a reason to assume it'd be slower. If anything,
this approach would open more opportunities for function-specific
optimizations, which in the long run could be faster. (I'm not
claiming that any such optimizations would be in the first version.)
In hindsight I wonder if it wasn't a mistake to embed ordered-aggregate
support in nodeAgg.c the way we did. We could have dumped that
responsibility into some sort of wrapper around specific aggregates,
with an option for some aggregates to skip the wrapper and handle it
themselves. A trivial, and perhaps not very useful, example is that
non-order-sensitive aggregates like MIN/MAX/COUNT could have been coded
to simply ignore any ordering request. I can't immediately think of any
examples that are compelling enough to justify such a refactoring now ---
unless it turns out to make WITHIN GROUP easier.
Anyway, I'm going to go off and look at the WITHIN GROUP patch with these
ideas in mind.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Further questions about WITHIN GROUP:
I believe that the spec requires that the "direct" arguments of an inverse
or hypothetical-set aggregate must not contain any Vars of the current
query level. They don't manage to say that in plain English, of course,
but in the <hypothetical set function> case the behavior is defined in
terms of this sub-select:
( SELECT 0, SK1, ..., SKK
FROM TNAME UNION ALL
VALUES (1, VE1, ..., VEK) )
where SKn are the sort key values, TNAME is an alias for the input table,
and VEn are the direct arguments. Any input-table Vars the aggregate
might refer to are thus in scope only for the SKn not the VEn. (This
definition also makes it clear that the VEn are to be evaluated once,
not once per row.) In the <inverse distribution function> case, they
resort to claiming that the value of the <inverse distribution function
argument> can be replaced by a numeric literal, which again makes it clear
that it's supposed to be evaluated just once.
So that's all fine, but the patch seems a bit confused about it. If you
try to cheat, you get an error message that's less than apropos:
regression=# select cume_dist(f1) within group(order by f1) from text_tbl ;
ERROR: column "text_tbl.f1" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select cume_dist(f1) within group(order by f1) from text_tbl...
^
I'd have hoped for a message along the line of "fixed arguments of
cume_dist() must not contain variables", similar to the phrasing we
use when you try to put a variable in LIMIT.
Also, there are some comments and code changes in nodeAgg.c that seem
to be on the wrong page here:
+ /*
+ * Use the representative input tuple for any references to
+ * non-aggregated input columns in the qual and tlist. (If we are not
+ * grouping, and there are no input rows at all, we will come here
+ * with an empty firstSlot ... but if not grouping, there can't be any
+ * references to non-aggregated input columns, so no problem.)
+ * We do this before finalizing because for ordered set functions,
+ * finalize_aggregates can evaluate arguments referencing the tuple.
+ */
+ econtext->ecxt_outertuple = firstSlot;
+
The last two lines of that comment are new, all the rest was moved from
someplace else. Those last two lines are wrong according to the above
reasoning, and if they were correct the argument made in the pre-existing
part of the comment would be all wet, meaning the code could in fact crash
when trying to evaluate a Var reference in finalize_aggregates.
So I'm inclined to undo this part of the patch (and anything else that's
rearranging logic with an eye to Var references in finalize_aggregates),
and to try to fix parse_agg.c so it gives a more specific error message
for this case.
After looking at this I'm a bit less enthused about the notion of hybrid
aggregates than I was before. I now see that the spec intends that when
the WITHIN GROUP notation is used, *all* the arguments to the left of
WITHIN GROUP are meant to be fixed evaluate-once arguments. While we
could possibly define aggregates for which some of those argument
positions are treated as evaluate-once-per-row arguments, I'm realizing
that that'd likely be pretty darn confusing for users.
What I now think we should do about the added pg_aggregate columns is
to have:
aggnfixedargs int number of fixed arguments, excluding any
hypothetical ones (always 0 for normal aggregates)
aggkind char 'n' for normal aggregate, 'o' for ordered set function,
'h' for hypothetical-set function
with the parsing rules that given
agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort specifications )
1. WITHIN GROUP is disallowed for normal aggregates.
2. For an ordered set function, n must equal aggnfixedargs. We treat all
n fixed arguments as contributing to the aggregate's result collation,
but ignore the sort arguments.
3. For a hypothetical-set function, n must equal aggnfixedargs plus k,
and we match up type and collation info of the last k fixed arguments
with the corresponding sort arguments. The first n-k fixed arguments
contribute to the aggregate's result collation, the rest not.
Reading back over this email, I see I've gone back and forth between
using the terms "direct args" and "fixed args" for the evaluate-once
stuff to the left of WITHIN GROUP. I guess I'm not really sold on
either terminology, but we need something we can use consistently
in the code and docs. The spec is no help, it has no generic term
at all for these args. Does anybody else have a preference, or
maybe another suggestion entirely?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers