ARRAY(subquery) volatility
Why does the first query below return the same value for each row
while the second query returns random values? Planner optimization?
test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5);
?column?
---------------------
{0.269273371561092}
{0.269273371561092}
{0.269273371561092}
{0.269273371561092}
{0.269273371561092}
(5 rows)
test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x);
?column?
---------------------
{0.826863945846848}
{0.42534113182935}
{0.36419924318986}
{0.258920902972538}
{0.843205466327819}
(5 rows)
--
Michael Fuhr
Michael Fuhr wrote:
Why does the first query below return the same value for each row
while the second query returns random values? Planner optimization?
I assume it is due to some kind of flattening in the planner, but it is
totally unrelated to ARRAY(subquery):
regression=# SELECT (SELECT random()) as f FROM generate_series(1, 5);
f
-------------------
0.752416231088534
0.752416231088534
0.752416231088534
0.752416231088534
0.752416231088534
(5 rows)
regression=# SELECT (SELECT random() + 0 * f) as f FROM
generate_series(1, 5) as t(f);
f
-------------------
0.176055165555354
0.608546747178094
0.55303416240636
0.127355110425202
0.21671894063089
(5 rows)
Here's another example:
regression=# create table t1(f text);
CREATE TABLE
regression=# insert into t1 values('0');
INSERT 17366 1
regression=# insert into t1 values('0');
INSERT 17367 1
regression=# insert into t1 values('0');
INSERT 17368 1
regression=# SELECT (SELECT timeofday()::timestamp + f::interval) as f
FROM t1;
f
----------------------------
2005-08-16 16:38:12.738215
2005-08-16 16:38:12.738292
2005-08-16 16:38:12.738315
(3 rows)
regression=# SELECT (SELECT timeofday()::timestamp) as f FROM t1;
f
----------------------------
2005-08-16 16:35:33.100791
2005-08-16 16:35:33.100791
2005-08-16 16:35:33.100791
(3 rows)
This shows that it isn't related to use of an SRF either.
Joe
Michael Fuhr <mike@fuhr.org> writes:
Why does the first query below return the same value for each row
while the second query returns random values? Planner optimization?
test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5);
test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x);
The sub-SELECT in the first one is considered an uncorrelated subquery,
so you get a plan that evaluates the subquery just once:
Function Scan on generate_series (cost=0.01..12.51 rows=1000 width=0)
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0)
In the second case, x is an outer parameter to the subquery, so it has
to be re-evaluated for each row of the outer query:
Function Scan on generate_series g (cost=0.00..32.50 rows=1000 width=4)
SubPlan
-> Result (cost=0.00..0.02 rows=1 width=0)
Note the "InitPlan" vs "SubPlan" labels --- they look similar, but the
evaluation rules are totally different.
The fact that there's a volatile function in the subquery isn't
considered while making this decision. I'm not sure if it should be.
regards, tom lane