Scalar subquery
Hi,
Apparently scalar subquery when used as a part of SELECT statement and when
it does not depend on outer query columns
is executed only once per statement, e.g.:
postgres=# select i, (select random()) rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.992319826036692
2 | 0.992319826036692
3 | 0.992319826036692
(Though term "depend" is subtle, compare these:
postgres=# select i, (select random() + case when false then i else 0 end )
rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.806265413761139
2 | 0.806265413761139
3 | 0.806265413761139
(3 rows)
postgres=# select i, (select random() where i=i ) rand from
generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.426443862728775
2 | 0.133071997668594
3 | 0.751982506364584
(3 rows)
postgres=# select i, (select random() where i=i or i is null ) rand from
generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.320982406847179
2 | 0.996762252878398
3 | 0.076554249972105
(3 rows)
Looks like dependence is not there anymore if PG is smart enough to simplify
boolean expressions)
Anyway, as some older PG versions and Oracle behave similarly I suppose this
result is expected and desired (correct?),
but unfortunately not well-documented (did I miss it mentioned?).
Can anyone shed some light on this and/or probably update docs?
P.S.
I got bitten by a statement like this:
select (select nextval('someseq') * a + b from somefunc()), col1, ....
with a and b being OUT parameters of somefunc().
I just got my hands on mysql (5.0.something) and it does not cache the
scalar subquery result.
So... now I'm completely puzzled whether this is a bug, a desired result or
just a loosely standardized thing.
Help anyone?
On Fri, Aug 27, 2010 at 5:41 PM, Vyacheslav Kalinin <vka@mgcp.com> wrote:
Show quoted text
Hi,
Apparently scalar subquery when used as a part of SELECT statement and when
it does not depend on outer query columns
is executed only once per statement, e.g.:postgres=# select i, (select random()) rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.992319826036692
2 | 0.992319826036692
3 | 0.992319826036692(Though term "depend" is subtle, compare these:
postgres=# select i, (select random() + case when false then i else 0 end )
rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.806265413761139
2 | 0.806265413761139
3 | 0.806265413761139
(3 rows)postgres=# select i, (select random() where i=i ) rand from
generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.426443862728775
2 | 0.133071997668594
3 | 0.751982506364584
(3 rows)postgres=# select i, (select random() where i=i or i is null ) rand from
generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.320982406847179
2 | 0.996762252878398
3 | 0.076554249972105
(3 rows)Looks like dependence is not there anymore if PG is smart enough to
simplify boolean expressions)Anyway, as some older PG versions and Oracle behave similarly I suppose
this result is expected and desired (correct?),
but unfortunately not well-documented (did I miss it mentioned?).
Can anyone shed some light on this and/or probably update docs?P.S.
I got bitten by a statement like this:
select (select nextval('someseq') * a + b from somefunc()), col1, ....
with a and b being OUT parameters of somefunc().
Vyacheslav Kalinin <vka@mgcp.com> writes:
I just got my hands on mysql (5.0.something) and it does not cache the
scalar subquery result.
So... now I'm completely puzzled whether this is a bug, a desired result or
just a loosely standardized thing.
It's loosely standardized.
AFAICS, the spec doesn't address the detailed semantics of subqueries at
all, except in wording to this effect:
Each <subquery> in the <search condition> is effectively
executed for each row of T and the results used in the ap-
plication of the <search condition> to the given row of T.
If any executed <subquery> contains an outer reference to a
column of T, the reference is to the value of that column in
the given row of T.
There is wording like this for subqueries in WHERE and HAVING, but I
haven't found anything at all that mentions the behavior for subqueries
in the SELECT targetlist. In any case, the fact that they said
"effectively executed" and not simply "executed" seems to be meant to
leave implementors a lot of wiggle room.
In particular, there isn't any wording that I can find suggesting
that the presence of volatile (or in the spec's classification,
nondeterministic) functions ought to affect the behavior.
PG's interpretation is that if there is no outer reference in a
subquery, it's okay to implement it as an initplan, meaning it gets
evaluated at most once per call of the containing query. We don't
pay attention to whether there are volatile functions in there.
regards, tom lane
Thanks, Tom
Can this be clarified in docs? It is stated there now that scalar subquery
is one of the kinds of expressions
and it is somewhat counter-intuitive that an expression may sometimes not
respect its own degree of volatility.
On Wed, Sep 1, 2010 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Vyacheslav Kalinin <vka@mgcp.com> writes:
I just got my hands on mysql (5.0.something) and it does not cache the
scalar subquery result.
So... now I'm completely puzzled whether this is a bug, a desired resultor
just a loosely standardized thing.
It's loosely standardized.
AFAICS, the spec doesn't address the detailed semantics of subqueries at
all, except in wording to this effect:Each <subquery> in the <search condition> is effectively
executed for each row of T and the results used in the ap-
plication of the <search condition> to the given row of T.
If any executed <subquery> contains an outer reference to a
column of T, the reference is to the value of that column in
the given row of T.There is wording like this for subqueries in WHERE and HAVING, but I
haven't found anything at all that mentions the behavior for subqueries
in the SELECT targetlist. In any case, the fact that they said
"effectively executed" and not simply "executed" seems to be meant to
leave implementors a lot of wiggle room.In particular, there isn't any wording that I can find suggesting
that the presence of volatile (or in the spec's classification,
nondeterministic) functions ought to affect the behavior.PG's interpretation is that if there is no outer reference in a
subquery, it's okay to implement it as an initplan, meaning it gets
evaluated at most once per call of the containing query. We don't
pay attention to whether there are volatile functions in there.regards, tom lane