Typo on tutorial window page

Started by PG Bug reporting formabout 1 year ago6 messagesdocs

noreply@postgresql.org

about 1 year ago

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/tutorial-window.html
Description:

EXPECTED:
As shown here, the rank function produces a numerical rank for each distinct
PARTITION BY value in the current row's partition, using the order defined
by the ORDER BY clause. rank needs no explicit parameter, because its
behavior is entirely determined by the OVER clause.

ACTUAL:
As shown here, the rank function produces a numerical rank for each distinct
ORDER BY value in the current row's partition, using the order defined by
the ORDER BY clause. rank needs no explicit parameter, because its behavior
is entirely determined by the OVER clause.

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: PG Bug reporting form (#1)

Re: Typo on tutorial window page

PG Doc comments form <noreply@postgresql.org> writes:

EXPECTED:
As shown here, the rank function produces a numerical rank for each distinct
PARTITION BY value in the current row's partition, using the order defined
by the ORDER BY clause. rank needs no explicit parameter, because its
behavior is entirely determined by the OVER clause.

ACTUAL:
As shown here, the rank function produces a numerical rank for each distinct
ORDER BY value in the current row's partition, using the order defined by
the ORDER BY clause. rank needs no explicit parameter, because its behavior
is entirely determined by the OVER clause.

Hmm, I think the current text is correct, or at least more nearly
correct than what you suggest. Look at the example:

<programlisting>
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
</programlisting>

<screen>
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 10 | 5200 | 2
develop | 11 | 5200 | 2
develop | 9 | 4500 | 4
develop | 7 | 4200 | 5
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
sales | 3 | 4800 | 2
(10 rows)
</screen>

The ranks are separate within each partition (depname), and
within a partition rows having the same salary get the
same rank.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: Tom Lane (#2)

Re: Typo on tutorial window page

On Tue, Jan 21, 2025 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

PG Doc comments form <noreply@postgresql.org> writes:

EXPECTED:
As shown here, the rank function produces a numerical rank for each

distinct

PARTITION BY value in the current row's partition, using the order

defined

by the ORDER BY clause. rank needs no explicit parameter, because its
behavior is entirely determined by the OVER clause.

ACTUAL:
As shown here, the rank function produces a numerical rank for each

distinct

ORDER BY value in the current row's partition, using the order defined by
the ORDER BY clause. rank needs no explicit parameter, because its

behavior

is entirely determined by the OVER clause.

Hmm, I think the current text is correct, or at least more nearly
correct than what you suggest.

Look at the example:

<programlisting>
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
</programlisting>

<screen>
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 10 | 5200 | 2
develop | 11 | 5200 | 2
develop | 9 | 4500 | 4
develop | 7 | 4200 | 5
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
sales | 3 | 4800 | 2
(10 rows)
</screen>

The ranks are separate within each partition (depname), and
within a partition rows having the same salary get the
same rank.

regards, tom lane

Олег Семенов

oselemg@gmail.com

about 1 year ago

In reply to: David G. Johnston (#3)

Re: Typo on tutorial window page

sorry, my bad, thanks for the clarification!

вт, 21 січ. 2025 р., 18:40 користувач David G. Johnston <
david.g.johnston@gmail.com> пише:

Show quoted text

On Tue, Jan 21, 2025 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Doc comments form <noreply@postgresql.org> writes:

EXPECTED:
As shown here, the rank function produces a numerical rank for each

distinct

PARTITION BY value in the current row's partition, using the order

defined

by the ORDER BY clause. rank needs no explicit parameter, because its
behavior is entirely determined by the OVER clause.

ACTUAL:
As shown here, the rank function produces a numerical rank for each

distinct

ORDER BY value in the current row's partition, using the order defined

by

the ORDER BY clause. rank needs no explicit parameter, because its

behavior

is entirely determined by the OVER clause.

Hmm, I think the current text is correct, or at least more nearly
correct than what you suggest.

Look at the example:

<programlisting>
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
</programlisting>

<screen>
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 10 | 5200 | 2
develop | 11 | 5200 | 2
develop | 9 | 4500 | 4
develop | 7 | 4200 | 5
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
sales | 3 | 4800 | 2
(10 rows)
</screen>

The ranks are separate within each partition (depname), and
within a partition rows having the same salary get the
same rank.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

about 1 year ago

In reply to: Tom Lane (#2)

Re: Typo on tutorial window page

On Tue, Jan 21, 2025 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Doc comments form <noreply@postgresql.org> writes:

EXPECTED:
As shown here, the rank function produces a numerical rank for each

distinct

PARTITION BY value in the current row's partition, using the order

defined

by the ORDER BY clause. rank needs no explicit parameter, because its
behavior is entirely determined by the OVER clause.

Hmm, I think the current text is correct, or at least more nearly
correct than what you suggest.

The ranks are separate within each partition (depname), and
within a partition rows having the same salary get the
same rank.

Sorry for the empty email - finger spasm hit send.

I was going to write basically that but something feels off to me. Maybe
something like this:

"As shown here, the rank function produces a numerical ranking within each
partition, using the order defined by the ORDER BY clause. Ranking assigns
the same rank to all rows that tie according to the order by criteria,
while still incrementing the rank counter. Thus there are gaps in the
serial numbering. These can be removed by instead using the dense_rank
window function. Ties can instead be given their own unique value by using
the row_number window function. In all these cases, as the window function
is effectively just counting rows, the function itself has no input
parameter."

If we don't want to get into that level of nuance in the tutorial I suggest
we use the row_number() window function instead of rank, and just say
because we count rows no parameter is needed.

David J.

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: David G. Johnston (#5)

Re: Typo on tutorial window page

"David G. Johnston" <david.g.johnston@gmail.com> writes:

I was going to write basically that but something feels off to me. Maybe
something like this:

"As shown here, the rank function produces a numerical ranking within each
partition, using the order defined by the ORDER BY clause. Ranking assigns
the same rank to all rows that tie according to the order by criteria,
while still incrementing the rank counter. Thus there are gaps in the
serial numbering. These can be removed by instead using the dense_rank
window function. Ties can instead be given their own unique value by using
the row_number window function. In all these cases, as the window function
is effectively just counting rows, the function itself has no input
parameter."

If we don't want to get into that level of nuance in the tutorial I suggest
we use the row_number() window function instead of rank, and just say
because we count rows no parameter is needed.

Yeah, I was wondering if it'd be worth bringing up dense_rank,
but decided "probably not". I like your idea of switching the
example to use row_number to simplify things. What would the
text be then? Perhaps

As shown here, the <function>row_number</function> function
assigns sequential numbers to the rows within each partition,
in the order defined by the <literal>ORDER BY</literal> clause
(with tied rows numbered in an unspecified order).
<function>row_number</function> needs no explicit parameter,
because its behavior
is entirely determined by the <literal>OVER</literal> clause.

regards, tom lane