Understanding behavior of SELECT with multiple unnested columns

Started by Ken Tanzerabout 13 years ago11 messagesgeneral

ken.tanzer@gmail.com

about 13 years ago

I've been working on some queries involving multiple unnested columns. At
first, I expected the number of rows returned would be the product of the
array lengths, so that this query would return 4 rows:

SELECT unnest2(array['a','b']),unnest2(array['1','2']);

when in fact it returns 2:

unnest2 | unnest2
---------+---------
a | 1
b | 2

Which is all well and good. (Better, in fact, for my purposes.) But then
this query returns 6 rows:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
unnest2 | unnest2
---------+---------
a | 1
b | 2
c | 1
a | 2
b | 1
c | 2

Throw an unnested null column in and you get zero rows, which I also didn't
expect:

SELECT
unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);
unnest2 | unnest2 | unnest
---------+---------+--------
(0 rows)

After some head scratching, I think I understand what to expect from these
unnests, but I'm unclear of the logic behind what is going on. I'm hoping
someone can explain it a bit. Also, on a practical level, would anyone
know how to query so that SELECT
unnest2(array['a','b','c']),unnest2(array['1','2']) would return three rows
instead of six, like so:

a 1
b 2
c (NULL)

As that would be perfect for my purposes. Thanks in advance!

Ken

--
AGENCY Software
A data system that puts you in control
100% Free Software
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

Subscribe to the mailing
list<agency-general-request@lists.sourceforge.net?body=subscribe>
to
learn more about AGENCY or
follow the discussion.

Ian Lawrence Barwick

barwick@gmail.com

about 13 years ago

In reply to: Ken Tanzer (#1)

Re: Understanding behavior of SELECT with multiple unnested columns

2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>

I've been working on some queries involving multiple unnested columns. At first, I expected the number of rows returned would be the product of the array lengths, so that this query would return 4 rows:

SELECT unnest2(array['a','b']),unnest2(array['1','2']);

when in fact it returns 2:

unnest2 | unnest2
---------+---------
a | 1
b | 2

Which is all well and good. (Better, in fact, for my purposes.) But then this query returns 6 rows:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
unnest2 | unnest2
---------+---------
a | 1
b | 2
c | 1
a | 2
b | 1
c | 2

Throw an unnested null column in and you get zero rows, which I also didn't expect:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);
unnest2 | unnest2 | unnest
---------+---------+--------
(0 rows)

After some head scratching, I think I understand what to expect from these unnests, but I'm unclear of the logic behind what is going on. I'm hoping someone can explain it a bit.

Basically you are getting Cartesian joins on the row output of
unnest() (and presumably
unnest2() - I guess this is a function you defined yourself?)

Effectively you are doing this:

CREATE TABLE t1 (val INT);
INSERT INTO t1 VALUES (1),(2);

CREATE TABLE t2 (val CHAR(1));
INSERT INTO t2 VALUES ('a'),('b'),('c');

CREATE TABLE t3 (val INT);

testdb=# SELECT * from t1, t2;
val | val
-----+-----
1 | a
1 | b
1 | c
2 | a
2 | b
2 | c
(6 rows)

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
val | val
-----+-----
1 | a
1 | b
2 | a
2 | b
(4 rows)

testdb=# SELECT * from t1, t2, t3;
val | val | val
-----+-----+-----
(0 rows)

HTH

Ian Barwick

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Ken Tanzer

ken.tanzer@gmail.com

about 13 years ago

In reply to: Ian Lawrence Barwick (#2)

Re: Understanding behavior of SELECT with multiple unnested columns

Basically you are getting Cartesian joins on the row output of
unnest()

Well that's what I expected too. Except look at this example, after you
delete c:

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
val | val
-----+-----
1 | a
1 | b
2 | a
2 | b
(4 rows)

And compare to:

SELECT unnest(array[1,2]),unnest(array['a','b']);
unnest | unnest
--------+--------
1 | a
2 | b
(2 rows)

You can see they are not the same! Or this, which does not return the 12
rows we might both expect:

SELECT unnest(array[1,2]),unnest(array['a','b']),unnest(array[4,5,6]);
unnest | unnest | unnest
--------+--------+--------
1 | a | 4
2 | b | 5
1 | a | 6
2 | b | 4
1 | a | 5
2 | b | 6
(6 rows)

Add another element onto the third array, so they "match up" better, and
you get only 4 rows:

SELECT unnest(array[1,2]),unnest(array['a','b']),unnest(array[4,5,6,7]);
unnest | unnest | unnest
--------+--------+--------
1 | a | 4
2 | b | 5
1 | a | 6
2 | b | 7
(4 rows)

(and presumably

unnest2() - I guess this is a function you defined yourself?)

Sorry for causing confusion--I meant to remove the unnest2. There was
source code for the unnest function for earlier versions, which I defined
as unnest2 to try to understand what was going on. It should yield the
same behavior as unnest itself.

Cheers,
Ken

On Tue, Mar 26, 2013 at 11:55 PM, Ian Lawrence Barwick <barwick@gmail.com>wrote:

2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>

I've been working on some queries involving multiple unnested columns.

At first, I expected the number of rows returned would be the product of
the array lengths, so that this query would return 4 rows:

SELECT unnest2(array['a','b']),unnest2(array['1','2']);

when in fact it returns 2:

unnest2 | unnest2
---------+---------
a | 1
b | 2

Which is all well and good. (Better, in fact, for my purposes.) But

then this query returns 6 rows:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
unnest2 | unnest2
---------+---------
a | 1
b | 2
c | 1
a | 2
b | 1
c | 2

Throw an unnested null column in and you get zero rows, which I also

didn't expect:

SELECT

unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);

unnest2 | unnest2 | unnest
---------+---------+--------
(0 rows)

After some head scratching, I think I understand what to expect from

these unnests, but I'm unclear of the logic behind what is going on. I'm
hoping someone can explain it a bit.

Basically you are getting Cartesian joins on the row output of
unnest() (and presumably
unnest2() - I guess this is a function you defined yourself?)

Effectively you are doing this:

CREATE TABLE t1 (val INT);
INSERT INTO t1 VALUES (1),(2);

CREATE TABLE t2 (val CHAR(1));
INSERT INTO t2 VALUES ('a'),('b'),('c');

CREATE TABLE t3 (val INT);

testdb=# SELECT * from t1, t2;
val | val
-----+-----
1 | a
1 | b
1 | c
2 | a
2 | b
2 | c
(6 rows)

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
val | val
-----+-----
1 | a
1 | b
2 | a
2 | b
(4 rows)

testdb=# SELECT * from t1, t2, t3;
val | val | val
-----+-----+-----
(0 rows)

HTH

Ian Barwick

--
AGENCY Software
A data system that puts you in control
100% Free Software
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

Subscribe to the mailing
list<agency-general-request@lists.sourceforge.net?body=subscribe>
to
learn more about AGENCY or
follow the discussion.

Ian Lawrence Barwick

barwick@gmail.com

about 13 years ago

In reply to: Ken Tanzer (#3)

Re: Understanding behavior of SELECT with multiple unnested columns

2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>

Basically you are getting Cartesian joins on the row output of

unnest()

Well that's what I expected too. Except look at this example, after you
delete c:

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
val | val
-----+-----
1 | a
1 | b
2 | a
2 | b
(4 rows)

And compare to:

SELECT unnest(array[1,2]),unnest(array['a','b']);
unnest | unnest
--------+--------
1 | a
2 | b
(2 rows)

You can see they are not the same!

Ah yes, what I suggested is actually the equivalent to
SELECT * FROM unnest(array[1,2]) u1,unnest(array['a','b']) u2;

I seem to recall seeing the explanation for this before, although I'll be
darned if I can remember what it is.

FWIW this happens with other functions returning SETOF:

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,2) y;
x | y
---+---
1 | 1
2 | 2
(2 rows)

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,3) y;
x | y
---+---
1 | 1
2 | 2
1 | 3
2 | 1
1 | 2
2 | 3
(6 rows)

Regards

Ian Barwick

Gavin Flower

GavinFlower@archidevsys.co.nz

about 13 years ago

In reply to: Ian Lawrence Barwick (#4)

Re: Understanding behavior of SELECT with multiple unnested columns

On 27/03/13 20:36, Ian Lawrence Barwick wrote:

2013/3/27 Ken Tanzer <ken.tanzer@gmail.com <mailto:ken.tanzer@gmail.com>>

Basically you are getting Cartesian joins on the row output of
unnest()

Well that's what I expected too. Except look at this example,
after you delete c:

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
val | val
-----+-----
1 | a
1 | b
2 | a
2 | b
(4 rows)
And compare to:

SELECT unnest(array[1,2]),unnest(array['a','b']);
unnest | unnest
--------+--------
1 | a
2 | b
(2 rows)

You can see they are not the same!

Ah yes, what I suggested is actually the equivalent to
SELECT * FROM unnest(array[1,2]) u1,unnest(array['a','b']) u2;

I seem to recall seeing the explanation for this before, although I'll be
darned if I can remember what it is.

FWIW this happens with other functions returning SETOF:

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,2) y;
x | y
---+---
1 | 1
2 | 2
(2 rows)

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,3) y;
x | y
---+---
1 | 1
2 | 2
1 | 3
2 | 1
1 | 2
2 | 3
(6 rows)

Regards

Ian Barwick

The rule appears to be,
where N_x & N_y are the number of entries returned for x & y:
N_result = is the smallest positive integer that has N_x & N_y as factors.

Cheers,
Gavin

Misa Simic

misa.simic@gmail.com

about 13 years ago

In reply to: Ken Tanzer (#1)

Re: Understanding behavior of SELECT with multiple unnested columns

Hi,

You can try:

SELECT c1, c2 FROM
(
WITH a AS
(
SELECT row_number() OVER(),* FROM unnest(array['a','b', 'c', 'd']) c1
),
b AS
(
SELECT row_number() OVER(),* FROM unnest(array['1','2', '3']) c2
)
SELECT * FROM a LEFT JOIN b USING (row_number)
UNION
SELECT * FROM a RIGHT JOIN b USING (row_number)
ORDER BY row_number
) t

To simplify this you can wrap it in function what accepts two array
parameters...

Kind Regards,

Misa

2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>

Show quoted text

I've been working on some queries involving multiple unnested columns. At
first, I expected the number of rows returned would be the product of the
array lengths, so that this query would return 4 rows:

SELECT unnest2(array['a','b']),unnest2(array['1','2']);

when in fact it returns 2:

unnest2 | unnest2
---------+---------
a | 1
b | 2

Which is all well and good. (Better, in fact, for my purposes.) But
then this query returns 6 rows:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
unnest2 | unnest2
---------+---------
a | 1
b | 2
c | 1
a | 2
b | 1
c | 2

Throw an unnested null column in and you get zero rows, which I also
didn't expect:

SELECT
unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);
unnest2 | unnest2 | unnest
---------+---------+--------
(0 rows)

After some head scratching, I think I understand what to expect from these
unnests, but I'm unclear of the logic behind what is going on. I'm hoping
someone can explain it a bit. Also, on a practical level, would anyone
know how to query so that SELECT
unnest2(array['a','b','c']),unnest2(array['1','2']) would return three rows
instead of six, like so:

a 1
b 2
c (NULL)

As that would be perfect for my purposes. Thanks in advance!

Ken

--
AGENCY Software
A data system that puts you in control
100% Free Software
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

Subscribe to the mailing list<agency-general-request@lists.sourceforge.net?body=subscribe>
to
learn more about AGENCY or
follow the discussion.

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Gavin Flower (#5)

Re: Understanding behavior of SELECT with multiple unnested columns

Gavin Flower <GavinFlower@archidevsys.co.nz> writes:

The rule appears to be,
where N_x & N_y are the number of entries returned for x & y:
N_result = is the smallest positive integer that has N_x & N_y as factors.

Right: if there are multiple set-returning functions in a SELECT list,
the number of rows you get is the least common multiple of their
periods. (See the logic in ExecTargetList that cycles the SRFs until
they all report "done" at the same time.) I guess there's some value
in this for the case where they all have the same period, but otherwise
it's kind of bizarre. It's been like that since Berkeley days though,
so I doubt we'll consider changing it now. Rather, it'll just be
quietly deprecated in favor of putting SRFs into FROM (with LATERAL
where needed).

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Ian Lawrence Barwick

barwick@gmail.com

about 13 years ago

In reply to: Tom Lane (#7)

Re: Understanding behavior of SELECT with multiple unnested columns

2013/3/27 Tom Lane <tgl@sss.pgh.pa.us>:

Gavin Flower <GavinFlower@archidevsys.co.nz> writes:

The rule appears to be,
where N_x & N_y are the number of entries returned for x & y:
N_result = is the smallest positive integer that has N_x & N_y as factors.

Right: if there are multiple set-returning functions in a SELECT list,
the number of rows you get is the least common multiple of their
periods. (See the logic in ExecTargetList that cycles the SRFs until
they all report "done" at the same time.) I guess there's some value
in this for the case where they all have the same period, but otherwise
it's kind of bizarre. It's been like that since Berkeley days though,
so I doubt we'll consider changing it now. Rather, it'll just be
quietly deprecated in favor of putting SRFs into FROM (with LATERAL
where needed).

Thanks for the clarification, I was half-worried there was some fundamental
set theory or something which had passed me by.

Regards

Ian Barwick

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Gavin Flower

GavinFlower@archidevsys.co.nz

about 13 years ago

In reply to: Tom Lane (#7)

Re: Understanding behavior of SELECT with multiple unnested columns

On 28/03/13 03:03, Tom Lane wrote:

Gavin Flower <GavinFlower@archidevsys.co.nz> writes:

The rule appears to be,
where N_x & N_y are the number of entries returned for x & y:
N_result = is the smallest positive integer that has N_x & N_y as factors.

Right: if there are multiple set-returning functions in a SELECT list,
the number of rows you get is the least common multiple of their
periods. (See the logic in ExecTargetList that cycles the SRFs until
they all report "done" at the same time.) I guess there's some value
in this for the case where they all have the same period, but otherwise
it's kind of bizarre. It's been like that since Berkeley days though,
so I doubt we'll consider changing it now. Rather, it'll just be
quietly deprecated in favor of putting SRFs into FROM (with LATERAL
where needed).

regards, tom lane

It surprised me when I first came across the behaviour, but I can't say
it is wrong!

I have used the current behaviour to create test data.

It might be useful to have a means of supplying the starting indices
list and the number of rows to generate (which would not only allow the
number to be less than the natural period, but also greater). I think
this would be very useful to create test data. Best if the facility
could take any source of values, including SELECT statements. Probably
not 'Politically Correct', but making it an SQL construct might be less
confusing than a function. Also, either, but not both of the indices
list, or the limit, could be omitted to get their default values.

Cheers,
Gavin

#10

Jasen Betts

jasen@xnet.co.nz

about 13 years ago

In reply to: Ken Tanzer (#1)

Re: Understanding behavior of SELECT with multiple unnested columns

On 2013-03-27, Ken Tanzer <ken.tanzer@gmail.com> wrote:

--047d7b5da657ecd54004d8e23a90
Content-Type: text/plain; charset=ISO-8859-1

I've been working on some queries involving multiple unnested columns. At
first, I expected the number of rows returned would be the product of the
array lengths, so that this query would return 4 rows:

SELECT unnest2(array['a','b']),unnest2(array['1','2']);

when in fact it returns 2:

unnest2 | unnest2
---------+---------
a | 1
b | 2

Which is all well and good. (Better, in fact, for my purposes.) But then
this query returns 6 rows:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
unnest2 | unnest2
---------+---------
a | 1
b | 2
c | 1
a | 2
b | 1
c | 2

check out the documentation for generate_series, it behaves similarly.

I think basically the problem is caused by using SRFs between SELECT
and FROM

to see that it's not the cartesion product try it with sets of length
4 and 6

SELECT unnest(array['a','b','c','d']),unnest(array['1','2','3','4','5','6']);

what you want is possible, but it's not pretty - you have to number
the results and join explicitly.

with
a as ( select u,row_number() over ()
from (select unnest(array['a','b']) as u ) as x),
b as ( select u,row_number() over ()
from (select unnest(array['1','2','3']) as u ) as x)
select a.u as a, b.u as b
from a full outer join b on a.row_number=b.row_number;

⚂⚃ 100% natural

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Merlin Moncure

mmoncure@gmail.com

about 13 years ago

In reply to: Tom Lane (#7)

Re: Understanding behavior of SELECT with multiple unnested columns

On Wed, Mar 27, 2013 at 9:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gavin Flower <GavinFlower@archidevsys.co.nz> writes:

The rule appears to be,
where N_x & N_y are the number of entries returned for x & y:
N_result = is the smallest positive integer that has N_x & N_y as factors.

Right: if there are multiple set-returning functions in a SELECT list,
the number of rows you get is the least common multiple of their
periods. (See the logic in ExecTargetList that cycles the SRFs until
they all report "done" at the same time.) I guess there's some value
in this for the case where they all have the same period, but otherwise
it's kind of bizarre. It's been like that since Berkeley days though,
so I doubt we'll consider changing it now. Rather, it'll just be
quietly deprecated in favor of putting SRFs into FROM (with LATERAL
where needed).

It's a neat way to make a query that doesn't terminate (which AFAIK is
impossible in vanilla SQL):

create sequence s;
select generate_series(1,nextval('s')), generate_series(1,nextval('s'));

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general