ARRAY(subquery) volatility

Started by Michael Fuhrover 20 years ago3 messages

mike@fuhr.org

over 20 years ago

Why does the first query below return the same value for each row
while the second query returns random values? Planner optimization?

test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5);
?column?
---------------------
{0.269273371561092}
{0.269273371561092}
{0.269273371561092}
{0.269273371561092}
{0.269273371561092}
(5 rows)

test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x);
?column?
---------------------
{0.826863945846848}
{0.42534113182935}
{0.36419924318986}
{0.258920902972538}
{0.843205466327819}
(5 rows)

--
Michael Fuhr

Joe Conway

mail@joeconway.com

over 20 years ago

In reply to: Michael Fuhr (#1)

Re: ARRAY(subquery) volatility

Michael Fuhr wrote:

Why does the first query below return the same value for each row
while the second query returns random values? Planner optimization?

I assume it is due to some kind of flattening in the planner, but it is
totally unrelated to ARRAY(subquery):

regression=# SELECT (SELECT random()) as f FROM generate_series(1, 5);
f
-------------------
0.752416231088534
0.752416231088534
0.752416231088534
0.752416231088534
0.752416231088534
(5 rows)

regression=# SELECT (SELECT random() + 0 * f) as f FROM
generate_series(1, 5) as t(f);
f
-------------------
0.176055165555354
0.608546747178094
0.55303416240636
0.127355110425202
0.21671894063089
(5 rows)

Here's another example:

regression=# create table t1(f text);
CREATE TABLE
regression=# insert into t1 values('0');
INSERT 17366 1
regression=# insert into t1 values('0');
INSERT 17367 1
regression=# insert into t1 values('0');
INSERT 17368 1
regression=# SELECT (SELECT timeofday()::timestamp + f::interval) as f
FROM t1;
f
----------------------------
2005-08-16 16:38:12.738215
2005-08-16 16:38:12.738292
2005-08-16 16:38:12.738315
(3 rows)

regression=# SELECT (SELECT timeofday()::timestamp) as f FROM t1;
f
----------------------------
2005-08-16 16:35:33.100791
2005-08-16 16:35:33.100791
2005-08-16 16:35:33.100791
(3 rows)

This shows that it isn't related to use of an SRF either.

Joe

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Michael Fuhr (#1)

Re: ARRAY(subquery) volatility

Michael Fuhr <mike@fuhr.org> writes:

Why does the first query below return the same value for each row
while the second query returns random values? Planner optimization?

test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5);
test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x);

The sub-SELECT in the first one is considered an uncorrelated subquery,
so you get a plan that evaluates the subquery just once:

Function Scan on generate_series (cost=0.01..12.51 rows=1000 width=0)
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0)

In the second case, x is an outer parameter to the subquery, so it has
to be re-evaluated for each row of the outer query:

Function Scan on generate_series g (cost=0.00..32.50 rows=1000 width=4)
SubPlan
-> Result (cost=0.00..0.02 rows=1 width=0)

Note the "InitPlan" vs "SubPlan" labels --- they look similar, but the
evaluation rules are totally different.

The fact that there's a volatile function in the subquery isn't
considered while making this decision. I'm not sure if it should be.

regards, tom lane