Bug? Function with side effects not evaluated in CTE

Started by Moshe Jacobsonover 12 years ago29 messagesgeneral
Jump to latest
#1Moshe Jacobson
moshe@neadwerx.com

In a new database test, I create the following table and function:

create table tb_item ( item integer );

create or replace function fn_new_item( in_item integer )
returns integer language plpgsql as
$_$
begin
insert into tb_item values( in_item );
return in_item;
end
$_$;

Then, I execute the following query and see DELETE 0 as you would expect:

test=# with tt_created as
(
select fn_new_item(1) as item
)
delete from tb_item
where item not in (select item from tt_created);
DELETE 0

However, if everything is correct, we should see one row in the table,
where item = 1. But we do not:

test=# table tb_item;
item
------
(0 rows)

I tried to outsmart Postgres with the following query, but it failed in the
same way:

with tt_created as
(
select fn_new_item(1) as item
),
tt_to_delete as
(
select i.item
from tb_item i
left join tt_created c
on i.item = c.item
where c.item is null
)
delete from tb_item i
using tt_to_delete td
where td.item = i.item;

However, It behaves as one would expect if the first CTE is built with INSERT
... RETURNING.
It also behaves as one would expect if the table starts out non-empty.

This seems like a bug. Would someone please look into this?

Thanks.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com&gt;
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

“Quality is not an act, it is a habit.” — Aristotle

#2Moshe Jacobson
moshe@neadwerx.com
In reply to: Moshe Jacobson (#1)
Re: Bug? Function with side effects not evaluated in CTE

By the way, I am running Postres 9.1.9.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Moshe Jacobson (#1)
Re: Bug? Function with side effects not evaluated in CTE

Moshe Jacobson <moshe@neadwerx.com> writes:

However, It behaves as one would expect if the first CTE is built with INSERT
... RETURNING.

CTEs containing INSERT/UPDATE/DELETE are guaranteed to be executed exactly
once. CTEs containing SELECTs are guaranteed to be executed at most once
(the documentation phrases that as "execution can stop early if the outer
query doesn't read all the rows" --- in this case, it read none of them,
since the outer query never had to evaluate the NOT IN).

I see no bug here.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#3)
Re: Bug? Function with side effects not evaluated in CTE

On Wed, Oct 16, 2013 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Moshe Jacobson <moshe@neadwerx.com> writes:

However, It behaves as one would expect if the first CTE is built with INSERT
... RETURNING.

CTEs containing INSERT/UPDATE/DELETE are guaranteed to be executed exactly
once. CTEs containing SELECTs are guaranteed to be executed at most once
(the documentation phrases that as "execution can stop early if the outer
query doesn't read all the rows" --- in this case, it read none of them,
since the outer query never had to evaluate the NOT IN).

I see no bug here.

You can force the CTE to be read a couple of different ways; it isn't
very difficult to do: just tweak the final query so that it *must*
touch the SELECT result somehow (e.g. AND (SELECT COUNT(*) FROM
tt_created) > 0).

That being said, I do think it might be better behavior (and still
technically correct per the documentation) if volatile query
expressions were force-evaluated.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Rowan Collins
rowan.collins@gmail.com
In reply to: Merlin Moncure (#4)
Re: Bug? Function with side effects not evaluated in CTE

On 17/10/2013 00:06, Merlin Moncure wrote:

That being said, I do think it might be better behavior (and still
technically correct per the documentation) if volatile query
expressions were force-evaluated.

This sounds reasonable for a "yes or no" case like this, but wouldn't it
raise the question of /how many times/ the function should be evaluated?

What if the query looked more like this:

with tt_created as
(
select fn_new_item(foo) as item
from some_huge_table
)
select item
from tt_created
limit 10

Should the CTE be calculated in its entirety, running the function for
every row in some_huge_table? Or should it run at most 10 times?

Which is desired would depend on the situation, but there's no real way
to indicate in the syntax.

--
Rowan Collins
[IMSoP]

#6Adam Jelinek
ajelinek@gmail.com
In reply to: Rowan Collins (#5)
Re: Bug? Function with side effects not evaluated in CTE

I thought this was interesting, and wanted to make sure I understood what
is going on, but the more tests I run the more confused I get.

if I take the exact set up outlined by Mosche I get the same results in 9.3
(as expected) , but if I insert one row before I run the sql the CTE is
executed and I get a new row in the table. I was hoping that I would see a
difference in the explain, but the explain with an empty table where the
CTE is *not* executed is identical to the explain where there is one row in
the table already and the CTE *is* executed resulting in a new row. I
thought maybe Postgres was not executing the CTE because it knows that
there are no rows in the table for it to delete, however if I change the
CTE to be an insert returning instead of a function I get different
results. Even when the table is empty I get new row created.

I would really like to know "why' it is working like this so something
similar does not come back and bite me in the future.

Thanks

On Wed, Oct 16, 2013 at 7:14 PM, Rowan Collins <rowan.collins@gmail.com>wrote:

Show quoted text

On 17/10/2013 00:06, Merlin Moncure wrote:

That being said, I do think it might be better behavior (and still
technically correct per the documentation) if volatile query
expressions were force-evaluated.

This sounds reasonable for a "yes or no" case like this, but wouldn't it
raise the question of *how many times* the function should be evaluated?

What if the query looked more like this:

with tt_created as
(
select fn_new_item(foo) as item
from some_huge_table
)
select item
from tt_created
limit 10

Should the CTE be calculated in its entirety, running the function for
every row in some_huge_table? Or should it run at most 10 times?

Which is desired would depend on the situation, but there's no real way to
indicate in the syntax.

--
Rowan Collins
[IMSoP]

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Adam Jelinek (#6)
Re: Bug? Function with side effects not evaluated in CTE

ajelinek@gmail.com wrote

but if I insert one row before I run the sql the CTE is
executed and I get a new row in the table. I was hoping that I would see
a
difference in the explain, but the explain with an empty table where the
CTE is *not* executed is identical to the explain where there is one row
in
the table already and the CTE *is* executed resulting in a new row.

Would help to include the explain(s). Did you ANALYZE after the insert; if
not the planner probably still thought the table was empty (thus the
matching explain) but upon execution realized it had records and thus needed
to run the CTE.

Since the executor cannot fully trust the statistics, and a full scan of an
empty table would be very fast, scanning the table to delete would be a
necessary first step before running the CTE for the secondary conditions
(where clause). An implicit first-condition/result is that a DELETE on an
empty table is effectively a No-Op. The only reason to override that no-op
would be if a CTE needs to be run by policy as Tom noted.

I thought maybe Postgres was not executing the CTE because it knows that
there are no rows in the table for it to delete, however if I change the
CTE to be an insert returning instead of a function I get different
results. Even when the table is empty I get new row created.

Like Tom said, if you don't hide the INSERT inside a function the CTE will
always be executed.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775095.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Rowan Collins (#5)
Re: Bug? Function with side effects not evaluated in CTE

On Wed, Oct 16, 2013 at 7:14 PM, Rowan Collins <rowan.collins@gmail.com> wrote:

On 17/10/2013 00:06, Merlin Moncure wrote:

That being said, I do think it might be better behavior (and still
technically correct per the documentation) if volatile query
expressions were force-evaluated.

This sounds reasonable for a "yes or no" case like this, but wouldn't it
raise the question of how many times the function should be evaluated?

What if the query looked more like this:

with tt_created as
(
select fn_new_item(foo) as item
from some_huge_table
)
select item
from tt_created
limit 10

Should the CTE be calculated in its entirety, running the function for every
row in some_huge_table? Or should it run at most 10 times?

Which is desired would depend on the situation, but there's no real way to
indicate in the syntax.

ISTM the answer is clearly "in its entirety". The premise is that the
optimization of non-evaluation of CTE queries is not dependent on
mechanics further down the chain if the CTE has volatile expressions.

If you wanted to structure the query so that the function was run only
10 times, that could be done trivially by moving the limit inside the
CTE.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9David G. Johnston
david.g.johnston@gmail.com
In reply to: Merlin Moncure (#8)
Re: Bug? Function with side effects not evaluated in CTE

Merlin Moncure-2 wrote

If you wanted to structure the query so that the function was run only
10 times, that could be done trivially by moving the limit inside the
CTE.

It is not trivial if you want to wrap the CTE expression into a VIEW and the
caller of the view only wishes to see/evaluate a subset of the data - even
if just to check the structure with some sample contents? A plain LIMIT is
limited though and usually an ORDER BY would be added for some kind of
top/bottom-N listing. In such a case the presence of the ORDER BY would
trigger the full scan and operate before the LIMIT.

And why is volatile so special here? A stable function seems just as good a
candidate for this behavior and even an immutable one. You are taking the
presence of "volatile" to mean that the function may be wrapping an
INSERT/UPDATE/DELETE and so should be executed in case that is true. The
examples are focused on avoiding expensive and needless SELECT evaluations.
Since INSERT/UPDATE/DELETE cannot occur in a non-volatile function, and
expensive table-processing/generating functions can be implemented as
stable, this would seem to work at the cost of further leveraging the
volatility modifier on the function - which defaults to VOLATILE.

Not sure if this is a good or bad decision to make but this thread hasn't
really provided a compelling use-case for making a change: the example
provided is too contrived. And while the current situation is somewhat
difficult to learn attaching yet more logic to the function volatility is
going to be difficult to learn too.

Wrapping INSERT/UPDATE/DELETE inside a non-procedural function is not that
common and in those cases where it is used it is most often a standalone API
and not something that is going to be reasonably used in a CTE. Putting it
into a CTE is an attempt to avoid procedural logic and/or SQL scripting; but
those tools are there to solve just this kind of problem. The separation of
data insertion from data querying is just as desirable as separating data
from presentation.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775098.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Merlin Moncure
mmoncure@gmail.com
In reply to: David G. Johnston (#9)
Re: Bug? Function with side effects not evaluated in CTE

On Fri, Oct 18, 2013 at 4:08 PM, David Johnston <polobo@yahoo.com> wrote:

And why is volatile so special here? A stable function seems just as good a
candidate for this behavior and even an immutable one.

Absolutely disagree with this. Stable operations do not have side
effects and volatile operations do (or at least they can). Point
being: after the query runs the database is in the same state
regardless if a stable function is run or not. So, for stable
operations (which covers most queries), the current behavior is good.
But, it's a completely different story in the case of volatile
expression and that's why I believe the optimization to skip is
plainly a POLA violation. OP is not the first to bring it up by the
way.

Wrapping INSERT/UPDATE/DELETE inside a non-procedural function is not that
common and in those cases where it is used it is most often a standalone API

I'm not buying this either. It may not be 'common' but people are
clearly exercising the behavior and 'data modifying with' is IMNSHO a
much better coding paradigm than functions wrapping same.
Regardless, the point at hand is whether specific plan semantics down
the chain can control whether or not volatile expressions should run.
Clearly, at least to me, they should not.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Adam Jelinek
ajelinek@gmail.com
In reply to: David G. Johnston (#7)
Re: Bug? Function with side effects not evaluated in CTE

Would help to include the explain(s). Did you ANALYZE after the insert; if
not the planner probably still thought the table was empty (thus the
matching explain) but upon execution realized it had records and thus

needed

to run the CTE.

I did not do an ANALYZE after the insert, I think the plan would still be
the same either way. I did what I should have done to start with and ran
explain analyze on the query which showed that it found (or did not find)
matching rows. After reading the remaining emails I think I understand
Thanks for explaining.

Here is my two cents (take it for what it is worth). I agree with Merlin on
this. I work as a developer at a large corporation, and in my experience
very few of the developers can write "good" SQL/data access, and then only
a fraction of them even try to understand the planner. Although the
behavior makes sense (from what was explained above) I does not do what one
would expect (the same thing every time). Then when you read the manual
on VOLATILE it states the optimizer makes no assumptions on such functions,
resulting in people asking why is this happening.

On Fri, Oct 18, 2013 at 3:39 PM, David Johnston <polobo@yahoo.com> wrote:

Show quoted text

ajelinek@gmail.com wrote

but if I insert one row before I run the sql the CTE is
executed and I get a new row in the table. I was hoping that I would see
a
difference in the explain, but the explain with an empty table where the
CTE is *not* executed is identical to the explain where there is one row
in
the table already and the CTE *is* executed resulting in a new row.

Would help to include the explain(s). Did you ANALYZE after the insert; if
not the planner probably still thought the table was empty (thus the
matching explain) but upon execution realized it had records and thus
needed
to run the CTE.

Since the executor cannot fully trust the statistics, and a full scan of an
empty table would be very fast, scanning the table to delete would be a
necessary first step before running the CTE for the secondary conditions
(where clause). An implicit first-condition/result is that a DELETE on an
empty table is effectively a No-Op. The only reason to override that no-op
would be if a CTE needs to be run by policy as Tom noted.

I thought maybe Postgres was not executing the CTE because it knows that
there are no rows in the table for it to delete, however if I change the
CTE to be an insert returning instead of a function I get different
results. Even when the table is empty I get new row created.

Like Tom said, if you don't hide the INSERT inside a function the CTE will
always be executed.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775095.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12David G. Johnston
david.g.johnston@gmail.com
In reply to: Merlin Moncure (#10)
Re: Bug? Function with side effects not evaluated in CTE

Merlin Moncure-2 wrote

Regardless, the point at hand is whether specific plan semantics down
the chain can control whether or not volatile expressions should run.
Clearly, at least to me, they should not.

I don't personally see any solid reason to reject the always evaluate CTEs
with volatile expressions option but at the same time am then curious that
if it keeps coming up and constitutes a POLA violation why it hasn't been
done (e.g., insufficient demand for the cost or are there technical
concerns). How severe is the POLA violation and are there compelling
use-cases for and/or against implementing this solution?

Put differently ideally this should be put either on the todo list or the
"we do not want" list and further inquiries can then go to building up
enough popular demand to convince someone to implement it. Maybe there
should be a "convince us" list? Not a todo yet but something still be
considered.

Keeping in mind that there are likely volatile queries relying on the
current optimization that would become non-optimized and possibly very
poorly performing. Since the "fix" for these would be a simple alter
function to make them stable the cost benefit seems worthwhile - since those
functions are probably mis-identified in the first place due to the use of
the default.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775125.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Moshe Jacobson
moshe@neadwerx.com
In reply to: David G. Johnston (#9)
Re: Bug? Function with side effects not evaluated in CTE

On Fri, Oct 18, 2013 at 5:08 PM, David Johnston <polobo@yahoo.com> wrote:

this thread hasn't really provided a compelling use-case for making a
change: the example
provided is too contrived.

It seems contrived because I distilled it down from what it originally was.
There is an actual use case in which this was failing:
We use stored procs to provide us the functionality of an UPSERT, which
PostgreSQL lacks.
We are using this in the first CTE to create new entries in a table, and we
are using the DELETE to delete the entries that already existed that we
didn't just create.
I am of the belief that if the function in a CTE is volatile, that it
should be executed unconditionally.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com&gt;
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

#14Merlin Moncure
mmoncure@gmail.com
In reply to: David G. Johnston (#12)
Re: Bug? Function with side effects not evaluated in CTE

On Fri, Oct 18, 2013 at 8:37 PM, David Johnston <polobo@yahoo.com> wrote:

Merlin Moncure-2 wrote

Regardless, the point at hand is whether specific plan semantics down
the chain can control whether or not volatile expressions should run.
Clearly, at least to me, they should not.

Put differently ideally this should be put either on the todo list or the
"we do not want" list and further inquiries can then go to building up
enough popular demand to convince someone to implement it. Maybe there
should be a "convince us" list? Not a todo yet but something still be
considered.

I would vote it as todo.

Keeping in mind that there are likely volatile queries relying on the
current optimization that would become non-optimized and possibly very
poorly performing. Since the "fix" for these would be a simple alter
function to make them stable the cost benefit seems worthwhile - since those
functions are probably mis-identified in the first place due to the use of
the default.

Any reliance on that behavior would be wrong because it's expressly
contraindicated by the documentation. TBH, If it were not
specifically documented that way, I would be considering this to be
bugged behavior. As for function mis-identification, fair point, but
that's a general problem and not specifically related to CTE
evaluation.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15David G. Johnston
david.g.johnston@gmail.com
In reply to: Merlin Moncure (#14)
Re: Bug? Function with side effects not evaluated in CTE

Merlin Moncure-2 wrote

Any reliance on that behavior would be wrong because it's expressly
contraindicated by the documentation.

That makes no practical difference since the decision to make the function
volatile is not conscious due to it being the default; and the current
behavior hides the fact that what they are doing is unsupported since they
do nothing special to invoke the optimization. The people likely to be hit
by this are those with the least experience and so in making the change,
which I do support, communication of the behavior difference needs to be
done in such a way as to reasonably reach and gain understanding from these
people.

That aside, I'm not coming up with any standard idioms that would benefit
from this optimization so the scope of the problem may very well be small
enough to just bite the bullet and deal with regression complaints as they
are voiced.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775292.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Moshe Jacobson (#13)
Re: Bug? Function with side effects not evaluated in CTE

Moshe Jacobson <moshe@neadwerx.com> writes:

I am of the belief that if the function in a CTE is volatile, that it
should be executed unconditionally.

[ shrug... ] Consider

SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;

How many times should the volatile function get executed? If your answer
is not "10", how is this different from the CTE case? This LIMIT clause
is restricting the number of times the function executes in pretty much
the same way that our definition of CTE evaluation does, AFAICS.

You could of course argue that our definition of LIMIT is wrong too,
but that's going to raise the bar for convincing people even higher,
because of the number of existing applications that such a redefinition
would break.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17Moshe Jacobson
moshe@neadwerx.com
In reply to: Tom Lane (#16)
Re: Bug? Function with side effects not evaluated in CTE

On Mon, Oct 21, 2013 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Consider

SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;

How many times should the volatile function get executed? If your answer
is not "10", how is this different from the CTE case? This LIMIT clause
is restricting the number of times the function executes in pretty much
the same way that our definition of CTE evaluation does, AFAICS.

I don't think your example above is analogous, because in your example, you
are asking *how many times* to execute the function, whereas in my example,
the question is *whether* to execute the query at all.
If the outer statement of the CTE doesn't need to use the contents of the
CTE, and there is no volatile function in there, then I agree that it's
fine not to execute it.
But if there is a volatile function, then the CTE query should always be
evaluated just like CREATE TEMP TABLE.
There is no question as to how many times to evaluate it here. It is just
once.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc. <http://www.neadwerx.com&gt;
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

#18David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#16)
Re: Bug? Function with side effects not evaluated in CTE

Tom Lane-2 wrote

Moshe Jacobson &lt;

moshe@

&gt; writes:

I am of the belief that if the function in a CTE is volatile, that it
should be executed unconditionally.

[ shrug... ] Consider

SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;

How many times should the volatile function get executed? If your answer
is not "10", how is this different from the CTE case? This LIMIT clause
is restricting the number of times the function executes in pretty much
the same way that our definition of CTE evaluation does, AFAICS.

You could of course argue that our definition of LIMIT is wrong too,
but that's going to raise the bar for convincing people even higher,
because of the number of existing applications that such a redefinition
would break.

The CTE would functionally replace the generate_series() call as opposed to
the select-list evaluation. Since the CTE establishes an optimization
boundary the parts of the query below the main (pulling) FROM clause should
not (or need not) influence the evaluation of the CTE.

The two comparable queries are:

A) WITH vf ( SELECT volatile_function(x) FROM generate_series(1,10) )
SELECT * FROM vf LIMIT 1

B) SELECT volatile_function(x) FROM generate_series(1,10) gs (x) LIMIT 1

In (A) the relation "vf" - which is a 10-row table with the result of
volatile_function as the only column - is limited to a single record and
that whole row is output as-is (because of the "*")

In (B) the relation "gs" - which is 10 rows having the result of
generate_series as the only column - is limited to a single row and then the
select-list project occurs against that single row (the volatile_function)

This is my naive, not technically informed, opinion of how these two
constructs differ.

This makes the optimization boundary characteristic of CTEs much stronger so
making such a boundary dependent upon whether the CTE contains any
volatile_functions seems desirable. That way if "volatile_function" is
instead made stable then its evaluation 10-times can be avoided; though in
this case that would depend on whether pushing down the LIMIT is even valid.

As commented by Moshe the "number of times" is less and issue than "yes/no"
determination but I guess that any simple implementation would have to
handle both cases identically so its impossible to ignore the implication on
the "number of times" queries in solving the "yes/no" problem.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775321.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Moshe Jacobson (#17)
Re: Bug? Function with side effects not evaluated in CTE

Moshe Jacobson <moshe@neadwerx.com> writes:

On Mon, Oct 21, 2013 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Consider

SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;

How many times should the volatile function get executed? If your answer
is not "10", how is this different from the CTE case? This LIMIT clause
is restricting the number of times the function executes in pretty much
the same way that our definition of CTE evaluation does, AFAICS.

I don't think your example above is analogous, because in your example, you
are asking *how many times* to execute the function, whereas in my example,
the question is *whether* to execute the query at all.

No, it's about how many times to execute it, ie, how many rows to pull
from the CTE. In particular, the optimization you're complaining about
is an early-termination rule that's not fundamentally different IMO from
what LIMIT does.

More generally, what you're arguing for is that the executor's behavior
should change depending on whether a query contains a volatile function.
That's a direction I think we shouldn't go in. Up to now, the presence of
volatile functions has been something that can disable particular planner
optimizations, but it's not of concern to the executor.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#18)
Re: Bug? Function with side effects not evaluated in CTE

David Johnston <polobo@yahoo.com> writes:

The two comparable queries are:

A) WITH vf ( SELECT volatile_function(x) FROM generate_series(1,10) )
SELECT * FROM vf LIMIT 1

B) SELECT volatile_function(x) FROM generate_series(1,10) gs (x) LIMIT 1

In (A) the relation "vf" - which is a 10-row table with the result of
volatile_function as the only column - is limited to a single record and
that whole row is output as-is (because of the "*")

In (B) the relation "gs" - which is 10 rows having the result of
generate_series as the only column - is limited to a single row and then the
select-list project occurs against that single row (the volatile_function)

Just for the record, your interpretation of (B) is wrong. LIMIT acts
after select-list evaluation --- try a set-returning function in the
select list to see that this is true.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#19)
#22David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#20)
#23Rowan Collins
rowan.collins@gmail.com
In reply to: Moshe Jacobson (#17)
#24BladeOfLight16
bladeoflight16@gmail.com
In reply to: Rowan Collins (#23)
#25John R Pierce
pierce@hogranch.com
In reply to: BladeOfLight16 (#24)
#26BladeOfLight16
bladeoflight16@gmail.com
In reply to: BladeOfLight16 (#24)
#27David G. Johnston
david.g.johnston@gmail.com
In reply to: BladeOfLight16 (#26)
#28Moshe Jacobson
moshe@neadwerx.com
In reply to: David G. Johnston (#27)
#29BladeOfLight16
bladeoflight16@gmail.com
In reply to: Moshe Jacobson (#28)