Window function docs

Started by Bruce Momjianabout 9 years ago11 messagesdocs
Jump to latest
#1Bruce Momjian
bruce@momjian.us

As part of writing a talk about window functions, I have done some
cleanups of the window function docs, attached, that I would like to
apply to head.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachments:

window.difftext/x-diff; charset=us-asciiDownload+40-34
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Window function docs

Bruce Momjian <bruce@momjian.us> writes:

As part of writing a talk about window functions, I have done some
cleanups of the window function docs, attached, that I would like to
apply to head.

I think this change is just wrong:

     aggregate function, but the <literal>OVER</> clause causes it to be
-    treated as a window function and computed across an appropriate set of
-    rows.)
+    treated as a window function and computed across the entire partition.)
    </para>

avg() is going to be evaluated across the current frame, not the whole
partition. I don't mind trying to be more precise than "appropriate set",
but it doesn't help to be apparently precise but wrong.

Not really convinced by the s/regular/normal/g changes. That doesn't seem
to clarify much either. Maybe it would help to define "normal" near here.
There's a definition in func.sgml (that they are not ordered-set or
hypothetical-set aggregates) but that's too far away.

In this bit:

-   definition.  Rows that are not distinct in the <literal>ORDER BY</>
-   ordering are said to be <firstterm>peers</>; the four ranking functions
-   are defined so that they give the same answer for any two peer rows.
+   definition.  Rows that are not distinct based on <literal>ORDER BY</>
+   ordering are said to be <firstterm>peers</>.  The four ranking
+   functions (includes <function>cume_dist</>) are defined so that they
+   give the same answer for any two peer rows.

"based on" seems little better than the previous wording. Maybe it'd be
better to say "Rows that are not distinct when considering only the
<literal>ORDER BY</> columns are said to be ..."

    sometimes also <function>nth_value</>.  You can redefine the frame by
-   adding a suitable frame specification (<literal>RANGE</> or
-   <literal>ROWS</>) to the <literal>OVER</> clause.
-   See <xref linkend="syntax-window-functions"> for more information
-   about frame specifications.
+   modifying the frame specification;  see <xref
+   linkend="syntax-window-functions"> for more information about frame
+   specifications.
   </para>

Don't really like this change; it makes it vaguer for no apparent benefit.

+  <para>
+   <function>cume_dist</> computes the ratio of partition rows that
+   are less than or equal to the current row and its peers, while
+   <function>percent_rank</> computes the ratio of lesser partition rows,
+   assuming the current row is does not exist in the partition.
+  </para>

The first one probably ought to be "computes the fraction of partition
rows". The second definition seems nonsensical --- ratio to what? Or
probably better, fraction of what? (And "is does not" needs help too.)

     the user.  Also, any built-in or user-defined normal aggregate function
-    can be used as a window function.  Ordered-set aggregates presently
-    cannot be used as window functions, however.
+    can be used as a window function.  However, ordered-set aggregate
+    cannot presently be used as window functions.
    </para>

Grammar (don't lose the "s" in ordered-set aggregates).

regards, tom lane

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Bruce Momjian (#1)
Re: Window function docs

On Thu, Apr 6, 2017 at 2:49 PM, Bruce Momjian <bruce@momjian.us> wrote:

As part of writing a talk about window functions, I have done some
cleanups of the window function docs, attached, that I would like to
apply to head.

+1 Overall. Some minor technical items noted below. The only conceptual
item is whether (and, if so, how) to deal with partitions vs. frames.

​- But unlike regular aggregate functions, use of a window function does
not
+ But unlike normal aggregate functions, use of a window function does
not

How broadly did you look for usage of normal versus regular? The window
function doc page uses "normal" once, vis-a-vis "ordered-set", I haven't
looked further. Note on that page the intro paragraph doesn't mention
hypothetical-set aggregates as being distinct from ordered-set aggregates
yet the tables and the docs here do.

https://www.postgresql.org/docs/current/static/functions-aggregate.html

+ treated as a window function and computed across the entire partition.)

Would introducing the concept of frame here trade precision for
comprehension?

-    <firstterm>window frame</>.  Many (but not all) window functions act
only
+    <firstterm>window frame</>.  Some window functions act only

This is a semantic change as to whether the dominate behavior is to act on
frames or partitions. IIUC very few window functions, when presented with
a ORDER BY'd OVER and thus a limited frame, will still act on the entire
partition instead of the just the frame they are provided. In particular
don't all normal aggregates (and user-defined ones) operate on frames? The
cume_dist function noted later is one of the few exceptions since it
effectively operates on both - the frame for the numerator and the
partition for the denominator.

+ aggregate function (i.e. not ordered-set or hypothetical-set aggregates)

I, and others, subscribe to the believe that both i.e., and e.g., should
have trailing commas...

http://english.stackexchange.com/questions/6894/should-there-be-a-comma-after-i-e

+ functions (includes <function>cume_dist</>) are defined so that they

s/b "including <function>cume_dist</>"

+ give the same answer for any two peer rows.

for all peer rows?

+ assuming the current row is does not exist in the partition.

remove "is"

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Window function docs

On Thu, Apr 6, 2017 at 06:13:44PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

As part of writing a talk about window functions, I have done some
cleanups of the window function docs, attached, that I would like to
apply to head.

I think this change is just wrong:

aggregate function, but the <literal>OVER</> clause causes it to be
-    treated as a window function and computed across an appropriate set of
-    rows.)
+    treated as a window function and computed across the entire partition.)
</para>

avg() is going to be evaluated across the current frame, not the whole
partition. I don't mind trying to be more precise than "appropriate set",
but it doesn't help to be apparently precise but wrong.

Good point. I have adjusted the text to say "window frame".

Not really convinced by the s/regular/normal/g changes. That doesn't seem
to clarify much either. Maybe it would help to define "normal" near here.
There's a definition in func.sgml (that they are not ordered-set or
hypothetical-set aggregates) but that's too far away.

Yeah, I went back and forth on that, and it seems like a mine-field.
The problem is that SUM() is both a non-window aggregate, and a window
function, and there isn't a clear terminology for non-window aggregates,
so changed the patch to just call them that. Hopefully the dual nature
of this is now clearer.

In this bit:

-   definition.  Rows that are not distinct in the <literal>ORDER BY</>
-   ordering are said to be <firstterm>peers</>; the four ranking functions
-   are defined so that they give the same answer for any two peer rows.
+   definition.  Rows that are not distinct based on <literal>ORDER BY</>
+   ordering are said to be <firstterm>peers</>.  The four ranking
+   functions (includes <function>cume_dist</>) are defined so that they
+   give the same answer for any two peer rows.

"based on" seems little better than the previous wording. Maybe it'd be
better to say "Rows that are not distinct when considering only the
<literal>ORDER BY</> columns are said to be ..."

OK, I used your wording.

sometimes also <function>nth_value</>.  You can redefine the frame by
-   adding a suitable frame specification (<literal>RANGE</> or
-   <literal>ROWS</>) to the <literal>OVER</> clause.
-   See <xref linkend="syntax-window-functions"> for more information
-   about frame specifications.
+   modifying the frame specification;  see <xref
+   linkend="syntax-window-functions"> for more information about frame
+   specifications.
</para>

Don't really like this change; it makes it vaguer for no apparent benefit.

Reverted. I had changed it because it is more the frame definition that
we want to highlight here, not ROW vs. RANGE, but there is no clearer
way to specify that clause.

+  <para>
+   <function>cume_dist</> computes the ratio of partition rows that
+   are less than or equal to the current row and its peers, while
+   <function>percent_rank</> computes the ratio of lesser partition rows,
+   assuming the current row is does not exist in the partition.
+  </para>

The first one probably ought to be "computes the fraction of partition
rows". The second definition seems nonsensical --- ratio to what? Or
probably better, fraction of what? (And "is does not" needs help too.)

Updated.

the user.  Also, any built-in or user-defined normal aggregate function
-    can be used as a window function.  Ordered-set aggregates presently
-    cannot be used as window functions, however.
+    can be used as a window function.  However, ordered-set aggregate
+    cannot presently be used as window functions.
</para>

Grammar (don't lose the "s" in ordered-set aggregates).

Thanks for the review. Updated patch attached.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachments:

window.difftext/x-diff; charset=us-asciiDownload+40-31
#5Bruce Momjian
bruce@momjian.us
In reply to: David G. Johnston (#3)
Re: Window function docs

On Thu, Apr 6, 2017 at 03:53:24PM -0700, David G. Johnston wrote:

On Thu, Apr 6, 2017 at 2:49 PM, Bruce Momjian <bruce@momjian.us> wrote:

As part of writing a talk about window functions, I have done some
cleanups of the window function docs, attached, that I would like to
apply to head.

+1 Overall.  Some minor technical items noted below.  The only conceptual item
is whether (and, if so, how) to deal with partitions vs. frames.

​-    But unlike regular aggregate functions, use of a window function does not
+    But unlike normal aggregate functions, use of a window function does not

How broadly did you look for usage of normal versus regular?  The window
function doc page uses "normal" once, vis-a-vis "ordered-set", I haven't looked
further.  Note on that page the intro paragraph doesn't mention

I did a grep looking for 'regular.*agg'.

hypothetical-set aggregates as being distinct from ordered-set aggregates yet
the tables and the docs here do.

Do you have any suggested text for that?

https://www.postgresql.org/docs/current/static/functions-aggregate.html

+    treated as a window function and computed across the entire partition.)

Would introducing the concept of frame here trade precision for comprehension?

I modified this to say "window frame" as part of the patch for Tom.

-    <firstterm>window frame</>.  Many (but not all) window functions act only
+    <firstterm>window frame</>.  Some window functions act only

This is a semantic change as to whether the dominate behavior is to act on
frames or partitions.  IIUC very few window functions, when presented with a
ORDER BY'd OVER and thus a limited frame, will still act on the entire
partition instead of the just the frame they are provided.  In particular don't

If you look at slide 79 here you can get a good overview of what
operates on frames vs. partitions:

http://momjian.us/main/writings/pgsql/window.pdf

all normal aggregates (and user-defined ones) operate on frames?  The cume_dist
function noted later is one of the few exceptions since it effectively operates
on both - the frame for the numerator and the partition for the denominator.

Actually, cume_dist does not operate on frames because if you specify
ORDER BY x ROWS CURRENT ROW, cume_dist still operates on the peer and
lesser rows. So, the frame clause is ignored by cume_dist --- in a way
it has RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW wired into the
window function.

+   aggregate function (i.e. not ordered-set or hypothetical-set aggregates)

I, and others, subscribe to the believe that both i.e., and e.g., should have
trailing commas...

http://english.stackexchange.com/questions/6894/
should-there-be-a-comma-after-i-e

Done.

+   functions (includes <function>cume_dist</>) are defined so that they

s/b "including <function>cume_dist</>"

Done.

+   give the same answer for any two peer rows.

for all peer rows?

Much better, done.

+   assuming the current row is does not exist in the partition.

remove "is"

Fixed in Tom's patch.

Updated patch attached. Thanks for the review.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachments:

window.difftext/x-diff; charset=us-asciiDownload+40-31
#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Bruce Momjian (#5)
Re: Window function docs

On Thu, Apr 6, 2017 at 6:14 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Apr 6, 2017 at 03:53:24PM -0700, David G. Johnston wrote:

hypothetical-set aggregates as being distinct from ordered-set

aggregates yet

the tables and the docs here do.

Do you have any suggested text for that?

- ​The built-in ordered-set aggregate functions are listed in Table 9-53
and Table 9-54.
+ The built-in ordered-set aggregate functions are listed in Table 9-53
<https://www.postgresql.org/docs/9.6/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE>
while
the built-in hypothetical-set ones are in Table 9-54
<https://www.postgresql.org/docs/9.6/static/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE>
.

That's a direct translation. Reading some more, though, I think that:

+ The built-in within-group aggregate functions are listed in Table 9-53
(ordered-set) and Table 9-54 (hypothetical-set).

So now we'd have:

*grouping-aggregates:*
normal aggregates (no suffix modifiers, though can embed ORDER BY in the
aggregate function argument area)

*within-group aggregates (requires WITHIN GROUP suffix modifier)*
ordered-set aggregates -> given parameters determine a record value,
possibly imputed (within the defined group)
hypothetical-set aggregates -> given a (virtual) record determine a
parameter (within the defined group)

*non-grouping aggregates:*
window aggregates (requires OVER suffix modifier)

The CREATE AGGREGATE only names those three specific types in the notes
section.

David J.

#7Bruce Momjian
bruce@momjian.us
In reply to: David G. Johnston (#6)
Re: Window function docs

On Thu, Apr 6, 2017 at 06:43:38PM -0700, David G. Johnston wrote:

On Thu, Apr 6, 2017 at 6:14 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Apr  6, 2017 at 03:53:24PM -0700, David G. Johnston wrote:

hypothetical-set aggregates as being distinct from ordered-set aggregates

yet

the tables and the docs here do.

Do you have any suggested text for that?

- ​The built-in ordered-set aggregate functions are listed in Table 9-53 and
Table 9-54.
+ The built-in ordered-set aggregate functions are listed in Table 9-53 while
the built-in hypothetical-set ones are in Table 9-54.

That's a direct translation.  Reading some more, though, I think that:

+ The built-in within-group aggregate functions are listed in Table 9-53
(ordered-set) and Table 9-54 (hypothetical-set).

I went with this, for simplicity:

The built-in within-group ordered-set aggregate functions
are listed in <xref linkend="functions-orderedset-table">
while the built-in within-group hypothetical-set ones are in <xref
linkend="functions-hypothetical-table">.

So now we'd have:

grouping-aggregates:
normal aggregates (no suffix modifiers, though can embed ORDER BY in the
aggregate function argument area)

within-group aggregates (requires WITHIN GROUP suffix modifier)
ordered-set aggregates -> given parameters determine a record value, possibly
imputed (within the defined group)
hypothetical-set aggregates -> given a (virtual) record determine a parameter
(within the defined group)

non-grouping aggregates:
window aggregates (requires OVER suffix modifier)

The CREATE AGGREGATE only names those three specific types in the notes
section.

Yes, very complicated.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Window function docs

On Thu, Apr 6, 2017 at 06:13:44PM -0400, Tom Lane wrote:

avg() is going to be evaluated across the current frame, not the whole
partition. I don't mind trying to be more precise than "appropriate set",
but it doesn't help to be apparently precise but wrong.

Not really convinced by the s/regular/normal/g changes. That doesn't seem
to clarify much either. Maybe it would help to define "normal" near here.
There's a definition in func.sgml (that they are not ordered-set or
hypothetical-set aggregates) but that's too far away.

This change is not on my patch yes but what do you think about changing
"regular" and "normal" aggregate designations to "generic"? That seems
clearer because the aggregates work in non-window and window contexts.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#8)
Re: Window function docs

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Apr 6, 2017 at 06:13:44PM -0400, Tom Lane wrote:

Not really convinced by the s/regular/normal/g changes. That doesn't seem
to clarify much either. Maybe it would help to define "normal" near here.
There's a definition in func.sgml (that they are not ordered-set or
hypothetical-set aggregates) but that's too far away.

This change is not on my patch yes but what do you think about changing
"regular" and "normal" aggregate designations to "generic"? That seems
clearer because the aggregates work in non-window and window contexts.

I don't think "generic" is le mot juste here. What's more generic about
avg() than about mode(), for instance?

My dictionary says "generic" means "characteristic of a class or group
of things", which doesn't seem to have anything to do with this.

I actually think "regular" might be a better choice than "normal" for this
purpose, mainly because somebody looking for statistical aggregates might
think that "normal" has something to do with normal distributions. So
maybe we should leave that one group of uses alone and s/normal/regular/
in section 9.20. But in any case it would help to define the term
explicitly in both sections.

regards, tom lane

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#9)
Re: Window function docs

On Fri, Apr 7, 2017 at 01:06:53PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

This change is not on my patch yes but what do you think about changing
"regular" and "normal" aggregate designations to "generic"? That seems
clearer because the aggregates work in non-window and window contexts.

I don't think "generic" is le mot juste here. What's more generic about
avg() than about mode(), for instance?

My dictionary says "generic" means "characteristic of a class or group
of things", which doesn't seem to have anything to do with this.

I actually think "regular" might be a better choice than "normal" for this
purpose, mainly because somebody looking for statistical aggregates might
think that "normal" has something to do with normal distributions. So
maybe we should leave that one group of uses alone and s/normal/regular/
in section 9.20. But in any case it would help to define the term
explicitly in both sections.

I don't think adding another defined term in this dense text is going to
help so I just specified "general-purpose and statistical aggregates"
where needed; patch attached.

I didn't touch any mention of "normal" in reference to
pg_aggregate.aggfnoid where normal/"n" has a specific meaning.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachments:

window.difftext/x-diff; charset=us-asciiDownload+102-94
#11Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#10)
Re: Window function docs

On Mon, Apr 10, 2017 at 03:15:58PM -0400, Bruce Momjian wrote:

On Fri, Apr 7, 2017 at 01:06:53PM -0400, Tom Lane wrote:

I actually think "regular" might be a better choice than "normal" for this
purpose, mainly because somebody looking for statistical aggregates might
think that "normal" has something to do with normal distributions. So
maybe we should leave that one group of uses alone and s/normal/regular/
in section 9.20. But in any case it would help to define the term
explicitly in both sections.

I don't think adding another defined term in this dense text is going to
help so I just specified "general-purpose and statistical aggregates"
where needed; patch attached.

I didn't touch any mention of "normal" in reference to
pg_aggregate.aggfnoid where normal/"n" has a specific meaning.

Patch applied.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs