Better documentation for row_number() combined with set returning functions

Started by Erik Tewsabout 6 years ago2 messagesdocs
Jump to latest
#1Erik Tews
erik@datenzone.de

Hi

The current documentation of Postgresql sounds like row_number() over
() can be used to number the rows returned by postgres. However, that
doesn't work when the query also uses set returning functions such as
json_array_elements. In this case, row_number() will be the same for
every element of the set.

I suggest to add a paragraph to the description of row_number() that
states that this is the behaviour and maybe also reference the "with
ordinality" feature that can be used instead.

Erik

#2Bruce Momjian
bruce@momjian.us
In reply to: Erik Tews (#1)
Re: Better documentation for row_number() combined with set returning functions

On Tue, Feb 25, 2020 at 08:24:13PM +0100, Erik Tews wrote:

Hi

The current documentation of Postgresql sounds like row_number() over
() can be used to number the rows returned by postgres. However, that

Yes, row_number() is a member of the set of window functions.

doesn't work when the query also uses set returning functions such as
json_array_elements. In this case, row_number() will be the same for
every element of the set.

Yes, the issue is that a set-returning function in the target list
generates multiple rows while the other target list values are
duplicated, e.g.:

CREATE TABLE test (x int);

INSERT INTO test VALUES (1), (2), (3);

SELECT x, generate_series(4,6) AS y, row_number() OVER () AS z FROM
test;
x | y | z
---+---+---
1 | 4 | 1
1 | 5 | 1
1 | 6 | 1
2 | 4 | 2
2 | 5 | 2
2 | 6 | 2
3 | 4 | 3
3 | 5 | 3
3 | 6 | 3

Notice the x=1 value is duplicated for the y values of 4,5,6, and the
row_number is duplicated too.

One way to avoid that is to do the expansion of the set-returning
function in a WITH query, and then apply row_number():

WITH z AS (select x, generate_series(4,6) as y from test)
select x, y, row_number() OVER () FROM z;
x | y | row_number
---+---+------------
1 | 4 | 1
1 | 5 | 2
1 | 6 | 3
2 | 4 | 4
2 | 5 | 5
2 | 6 | 6
3 | 4 | 7
3 | 5 | 8
3 | 6 | 9

I suggest to add a paragraph to the description of row_number() that
states that this is the behavior and maybe also reference the "with
ordinality" feature that can be used instead.

I can't see how ordinality could be used:

SELECT x, z, ord FROM test, generate_series(4,6) WITH ORDINALITY AS a(z, ord);
x | z | ord
---+---+-----
1 | 4 | 1
2 | 4 | 1
3 | 4 | 1
1 | 5 | 2
2 | 5 | 2
3 | 5 | 2
1 | 6 | 3
2 | 6 | 3
3 | 6 | 3

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +