lead() with arrays - strange behaviour

Started by Thomas Kellererover 6 years ago5 messagesgeneral
Jump to latest
#1Thomas Kellerer
spam_eater@gmx.net

Consider the following dummy table (this is a simplified example from a bigger query):

create table sample_data (id int, id_list int[]);
insert into sample_data (id, id_list)
values
(1, array[1,2,3]),
(2, array[2,3,4]),
(3, array[4,5,6]);

The following statement tries to find the overlapping values in id_list between the current row and the next row:

select id,
id_list,
lead(id_list) over (order by id) as next_list,
array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids
from sample_data;

The above returns:

id | id_list | next_list | common_ids
---+---------+-----------+-----------
1 | {1,2,3} | {2,3,4} | {}
2 | {2,3,4} | {4,5,6} | {}
3 | {4,5,6} | | {}

The empty array for "common_ids" is obviously incorrect.

However, when the evaluation of the "next_list" is put into a derived table, then this works as expected:

select id, id_list, next_list,
array(select unnest(id_list) intersect select unnest(next_list)) as common_ids
from (
select id,
id_list,
lead(id_list) over (order by id) as next_list
from sample_data
) t

returns:

id | id_list | next_list | common_ids
---+---------+-----------+-----------
1 | {1,2,3} | {2,3,4} | {2,3}
2 | {2,3,4} | {4,5,6} | {4}
3 | {4,5,6} | | {}

This is with Postgres 11.4

Is this a bug or simply not supported?

It does work correctly with intarray's "intersect" operator:

select id,
id_list,
id_list & lead(id_list) over (order by id) as next_list
from sample_data;

However, the actual data uses a bigint, so intarray isn't an option.

Thomas

#2David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Kellerer (#1)
Re: lead() with arrays - strange behaviour

On Thu, 8 Aug 2019 at 21:06, Thomas Kellerer <spam_eater@gmx.net> wrote:

The following statement tries to find the overlapping values in id_list between the current row and the next row:

select id,
id_list,
lead(id_list) over (order by id) as next_list,
array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids
from sample_data;

The above returns:

id | id_list | next_list | common_ids
---+---------+-----------+-----------
1 | {1,2,3} | {2,3,4} | {}
2 | {2,3,4} | {4,5,6} | {}
3 | {4,5,6} | | {}

The empty array for "common_ids" is obviously incorrect.

I think you're confused with what the SELECT with the empty FROM
clause does here. In your subquery "id_list" is just a parameter from
the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
return anything since those are both just effectively scalar values,
to which there is no "next" value.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Thomas Kellerer
spam_eater@gmx.net
In reply to: David Rowley (#2)
Re: lead() with arrays - strange behaviour

David Rowley schrieb am 08.08.2019 um 13:03:

The following statement tries to find the overlapping values in id_list between the current row and the next row:

select id,
id_list,
lead(id_list) over (order by id) as next_list,
array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids
from sample_data;

The above returns:

id | id_list | next_list | common_ids
---+---------+-----------+-----------
1 | {1,2,3} | {2,3,4} | {}
2 | {2,3,4} | {4,5,6} | {}
3 | {4,5,6} | | {}

The empty array for "common_ids" is obviously incorrect.

I think you're confused with what the SELECT with the empty FROM
clause does here. In your subquery "id_list" is just a parameter from
the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
return anything since those are both just effectively scalar values,
to which there is no "next" value.

id_list is a column in the table and as you can see in the output
lead(id_list) most definitely returns the array from the next row.

and "select unnest(some_array)" works just fine as you can see
when "next_list" is taken from the derived table.

Thomas

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Kellerer (#3)
Re: lead() with arrays - strange behaviour

Thomas Kellerer <spam_eater@gmx.net> writes:

David Rowley schrieb am 08.08.2019 um 13:03:

I think you're confused with what the SELECT with the empty FROM
clause does here. In your subquery "id_list" is just a parameter from
the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
return anything since those are both just effectively scalar values,
to which there is no "next" value.

id_list is a column in the table and as you can see in the output
lead(id_list) most definitely returns the array from the next row.
and "select unnest(some_array)" works just fine as you can see
when "next_list" is taken from the derived table.

David's point is that the two occurrences of lead() don't mean the
same thing. A window function is directly tied to the SELECT that
it is in the select-list of, and its notion of next and previous
rows is concerned with the set of rows that that SELECT's FROM-clause
generates. In this example, the inner SELECT has an empty FROM that
returns one row, so the lead() in that SELECT doesn't do anything
useful.

You could probably get where you want to go with something along
the lines of

select id,
id_list,
next_list,
array(select unnest(id_list) intersect select unnest(next_list)) as common_ids
from (
select id,
id_list,
lead(id_list) over (order by id) as next_list
from sample_data
) ss;

regards, tom lane

#5Thomas Kellerer
spam_eater@gmx.net
In reply to: Tom Lane (#4)
Re: lead() with arrays - strange behaviour

Tom Lane schrieb am 08.08.2019 um 16:10:

David's point is that the two occurrences of lead() don't mean the
same thing. A window function is directly tied to the SELECT that
it is in the select-list of, and its notion of next and previous
rows is concerned with the set of rows that that SELECT's FROM-clause
generates. In this example, the inner SELECT has an empty FROM that
returns one row, so the lead() in that SELECT doesn't do anything
useful.

Ah! Now I get it ;)

Thanks for clearing that up.

You could probably get where you want to go with something along
the lines of

Yes, that's what I did in the end (see my initial post)