Making CASE error handling less surprising

Started by Tom Lanealmost 6 years ago30 messageshackers

tgl@sss.pgh.pa.us

almost 6 years ago

Every so often we get a complaint like [1]/messages/by-id/16549-4991fbf36fcec234@postgresql.org about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at run-time.

It struck me that it would not be hard to improve this situation a great
deal. If, within a CASE subexpression that isn't certain to be executed
at runtime, we refuse to pre-evaluate *any* function (essentially, treat
them all as volatile), then we should largely get the semantics that
users expect. There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

Attached is a draft patch that handles CASE and COALESCE this way.

This is not a complete fix, because if you write a sub-SELECT the
contents of the sub-SELECT are not processed by the outer query's
eval_const_expressions pass; instead, we look at it within the
sub-SELECT itself, and in that context there's no apparent reason
to avoid const-folding. So
CASE WHEN x < 0 THEN (SELECT 1/0) END
fails even if x is never less than zero. I don't see any great way
to avoid that, and I'm not particularly concerned about it anyhow;
usually the point of a sub-SELECT like this is to be decoupled from
outer query evaluation, so that the behavior should not be that
surprising.

One interesting point is that the join regression test contains a
number of uses of "coalesce(int8-variable, int4-constant)" which is
treated a little differently than before: we no longer constant-fold
the int4 constant to int8. That causes the run-time cost of the
expression to be estimated slightly higher, which changes plans in
a couple of these tests; and in any case the EXPLAIN output looks
different since it shows the runtime coercion explicitly. To avoid
those changes I made all these examples quote the constants, so that
the parser resolves them as int8 out of the gate. (Perhaps it'd be
okay to just accept the changes, but I didn't feel like trying to
analyze in detail what each test case had been meant to prove.)

Also, I didn't touch the docs yet. Sections 4.2.14 and 9.18.1
contain some weasel wording that could be backed off, but in light
of the sub-SELECT exception we can't just remove the issue
altogether I think. Not quite sure how to word it.

Thoughts?

regards, tom lane

[1]: /messages/by-id/16549-4991fbf36fcec234@postgresql.org

Dagfinn Ilmari Mannsåker

ilmari@ilmari.org

almost 6 years ago

In reply to: Tom Lane (#1)

Re: Making CASE error handling less surprising

Tom Lane <tgl@sss.pgh.pa.us> writes:

Every so often we get a complaint like [1] about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at run-time.

It struck me that it would not be hard to improve this situation a great
deal. If, within a CASE subexpression that isn't certain to be executed
at runtime, we refuse to pre-evaluate *any* function (essentially, treat
them all as volatile), then we should largely get the semantics that
users expect. There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

[…]

Thoughts?

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

- ilmari
--
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Dagfinn Ilmari Mannsåker (#2)

Re: Making CASE error handling less surprising

Hi,

On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:

Tom Lane <tgl@sss.pgh.pa.us> writes:

Every so often we get a complaint like [1] about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at run-time.

It struck me that it would not be hard to improve this situation a great
deal. If, within a CASE subexpression that isn't certain to be executed
at runtime, we refuse to pre-evaluate *any* function (essentially, treat
them all as volatile), then we should largely get the semantics that
users expect. There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

[…]

Thoughts?

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

That'd require using a subtransaction for expression
simplification. That'd be way too high overhead.

Given how often we've had a need to call functions while handling
errors, I do wonder if it'd be worthwhile and feasible to mark functions
as being safe to call without subtransactions, or mark them as not
erroring out (e.g. comparators would usually be safe).

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Andres Freund (#3)

Re: Making CASE error handling less surprising

Andres Freund <andres@anarazel.de> writes:

On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

That'd require using a subtransaction for expression
simplification. That'd be way too high overhead.

That's my opinion as well. It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

Given how often we've had a need to call functions while handling
errors, I do wonder if it'd be worthwhile and feasible to mark functions
as being safe to call without subtransactions, or mark them as not
erroring out (e.g. comparators would usually be safe).

Yeah. I was wondering whether the existing "leakproof" marking would
be adequate for this purpose. It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Tom Lane (#4)

Re: Making CASE error handling less surprising

čt 23. 7. 2020 v 21:43 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Andres Freund <andres@anarazel.de> writes:

On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

That'd require using a subtransaction for expression
simplification. That'd be way too high overhead.

That's my opinion as well. It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

Given how often we've had a need to call functions while handling
errors, I do wonder if it'd be worthwhile and feasible to mark functions
as being safe to call without subtransactions, or mark them as not
erroring out (e.g. comparators would usually be safe).

Yeah. I was wondering whether the existing "leakproof" marking would
be adequate for this purpose. It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

I am afraid of a performance impact.

lot of people expects constant folding everywhere now and I can imagine
query like

SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye') END FROM ...

Now, it is optimized well, but with the proposed patch, this query can be
slow.

We should introduce planner safe functions for some usual functions, or
maybe better explain the behaviour, the costs, and benefits. I don't think
this issue is too common.

Regards

Pavel

Show quoted text

regards, tom lane

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Tom Lane (#4)

Re: Making CASE error handling less surprising

Hi,

On 2020-07-23 15:43:44 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsï¿½ker wrote:

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

That'd require using a subtransaction for expression
simplification. That'd be way too high overhead.

That's my opinion as well. It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

I guess we could optimize it to be one subtransaction by having error
recovery be to redo simplification with a parameter that prevents doing
simplification within CASE etc. Still too unattractive performancewise
to consider imo.

Given how often we've had a need to call functions while handling
errors, I do wonder if it'd be worthwhile and feasible to mark functions
as being safe to call without subtransactions, or mark them as not
erroring out (e.g. comparators would usually be safe).

Yeah. I was wondering whether the existing "leakproof" marking would
be adequate for this purpose. It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

Hm, I didn't consider that. Good idea.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

A quick look through the list seems to confirm that. There's errors like
in text_starts_with:

if (mylocale && !mylocale->deterministic)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("nondeterministic collations are not supported for substring searches")));

but that's not a content dependent error, so I don't think it's problem.

So the idea would be to continue to do simplification like we do right
now for things outside a CASE but to only call leakproof functions
within a case?

Is there any concern about having to do additional lookups for
leakproofness? It doesn't seem likely to me since we already need to do
lookups for the FmgrInfo?

Greetings,

Andres Freund

Pavel Stehule

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Pavel Stehule (#5)

Re: Making CASE error handling less surprising

čt 23. 7. 2020 v 21:56 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

čt 23. 7. 2020 v 21:43 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Andres Freund <andres@anarazel.de> writes:

On 2020-07-23 18:50:32 +0100, Dagfinn Ilmari Mannsåker wrote:

Would it be feasible to set up an exception handler when constant-
folding cases that might not be reached, and leave the expression
unfolded only if an error was thrown, or does that have too much
overhead to be worthwhile?

That'd require using a subtransaction for expression
simplification. That'd be way too high overhead.

That's my opinion as well. It'd be a subtransaction for *each*
operator/function call we need to simplify, which seems completely
disastrous.

Given how often we've had a need to call functions while handling
errors, I do wonder if it'd be worthwhile and feasible to mark functions
as being safe to call without subtransactions, or mark them as not
erroring out (e.g. comparators would usually be safe).

Yeah. I was wondering whether the existing "leakproof" marking would
be adequate for this purpose. It's a little stronger than what we
need, but the pain-in-the-rear factor for adding YA function property
is high enough that I'm inclined to just use it anyway.

We do have to assume that "leakproof" includes "cannot throw any
input-dependent error", but it seems to me that that's true.

I am afraid of a performance impact.

lot of people expects constant folding everywhere now and I can imagine
query like

SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye') END FROM ...

Now, it is optimized well, but with the proposed patch, this query can be
slow.

We should introduce planner safe functions for some usual functions, or
maybe better explain the behaviour, the costs, and benefits. I don't think
this issue is too common.

what about different access. We can introduce function

create or replace function volatile_expr(anyelement) returns anyelement as
$$ begin return $1; end $$ language plpgsql;

and this can be used as a constant folding optimization fence.

select case col when 1 then volatile_expr(1/$1) else $1 end;

I don't think so people have a problem with this behaviour - the problem is
unexpected behaviour change between major releases without really
illustrative explanation in documentation.

Regards

Pavel

Show quoted text

Regards

Pavel

regards, tom lane

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Pavel Stehule (#5)

Re: Making CASE error handling less surprising

Hi,

On 2020-07-23 21:56:26 +0200, Pavel Stehule wrote:

I am afraid of a performance impact.

lot of people expects constant folding everywhere now and I can imagine
query like

SELECT CASE col1 WHEN 1 THEN upper('hello') ELSE upper('bye') END FROM ...

Now, it is optimized well, but with the proposed patch, this query can be
slow.

I'd be more concerned about thinks like conditional expressions that
involve both columns and non-comparison operations on constants. Where
right now we'd simplify the constant part of the expression, but
wouldn't at all anymore after this.

Is there an argument to continue simplifying expressions within case
when only involving "true" constants even with not leakproof functions,
but only simplify "pseudo" constants like parameters with leakproof
functions? I.e CASE WHEN ... THEN 1 / 0 would still raise an error
during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
is not a real constant (even if PARAM_FLAG_CONST).

It doesn't seem like it'd be too hard to implement that, but that it'd
probably be fairly bulky because we'd need to track more state across
recursive expression_tree_mutator() calls.

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Andres Freund (#6)

Re: Making CASE error handling less surprising

Andres Freund <andres@anarazel.de> writes:

Is there any concern about having to do additional lookups for
leakproofness? It doesn't seem likely to me since we already need to do
lookups for the FmgrInfo?

No, we could easily fix it so that one syscache lookup gets both
the provolatile and proleakproof markings.

regards, tom lane

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Andres Freund (#8)

Re: Making CASE error handling less surprising

Andres Freund <andres@anarazel.de> writes:

Is there an argument to continue simplifying expressions within case
when only involving "true" constants even with not leakproof functions,
but only simplify "pseudo" constants like parameters with leakproof
functions? I.e CASE WHEN ... THEN 1 / 0 would still raise an error
during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
is not a real constant (even if PARAM_FLAG_CONST).

Hmm, interesting idea. That might fix all the practical cases in plpgsql,
but it wouldn't do anything to make the behavior more explainable. Not
sure if we care about that though.

If we go this way I'd be inclined to do this instead of, not in addition
to, what I originally proposed. Not sure if that was how you envisioned
it, but I think this is probably sufficient for its purpose and we would
not need any additional lobotomization of const-simplification.

It doesn't seem like it'd be too hard to implement that, but that it'd
probably be fairly bulky because we'd need to track more state across
recursive expression_tree_mutator() calls.

It wouldn't be any harder than what I posted upthread; it would
just be a different flag getting passed down in the context struct
and getting tested in a different place.

regards, tom lane

#11

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Tom Lane (#10)

Re: Making CASE error handling less surprising

Hi,

On 2020-07-23 16:34:25 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Is there an argument to continue simplifying expressions within case
when only involving "true" constants even with not leakproof functions,
but only simplify "pseudo" constants like parameters with leakproof
functions? I.e CASE WHEN ... THEN 1 / 0 would still raise an error
during simplification but CASE WHEN ... THEN 1 / $1 wouldn't, because $1
is not a real constant (even if PARAM_FLAG_CONST).

Hmm, interesting idea. That might fix all the practical cases in plpgsql,
but it wouldn't do anything to make the behavior more explainable. Not
sure if we care about that though.

I've probably done too much compiler stuff, but to me it doesn't seem
too hard to understand that purely constant expressions may get
evaluated unconditionally even when inside a CASE, but everything else
won't. The fact that we sometimes optimize params to be essentially
constants isn't really exposed to users, so shouldn't be confusing.

If we go this way I'd be inclined to do this instead of, not in addition
to, what I originally proposed. Not sure if that was how you envisioned
it, but I think this is probably sufficient for its purpose and we would
not need any additional lobotomization of const-simplification.

Yea, I would assume that we'd not need anything else. I've not thought
about the subquery case yet, so perhaps it'd be desirable to do
something additional there.

It doesn't seem like it'd be too hard to implement that, but that it'd
probably be fairly bulky because we'd need to track more state across
recursive expression_tree_mutator() calls.

It wouldn't be any harder than what I posted upthread; it would
just be a different flag getting passed down in the context struct
and getting tested in a different place.

Cool.

Greetings,

Andres Freund

#12

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Andres Freund (#11)

Re: Making CASE error handling less surprising

Hi,

On 2020-07-23 13:42:08 -0700, Andres Freund wrote:

On 2020-07-23 16:34:25 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

It doesn't seem like it'd be too hard to implement that, but that it'd
probably be fairly bulky because we'd need to track more state across
recursive expression_tree_mutator() calls.

It wouldn't be any harder than what I posted upthread; it would
just be a different flag getting passed down in the context struct
and getting tested in a different place.

Cool.

Hm. Would SQL function inlining be a problem? It looks like that just
substitutes parameters. Before calling
eval_const_expressions_mutator(). So we'd not know not to evaluate such
"pseudo constants". And that'd probably be confusing, especially
because it's not exactly obvious when inlining happens.

Greetings,

Andres Freund

#13

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Andres Freund (#12)

Re: Making CASE error handling less surprising

Andres Freund <andres@anarazel.de> writes:

Hm. Would SQL function inlining be a problem? It looks like that just
substitutes parameters. Before calling
eval_const_expressions_mutator(). So we'd not know not to evaluate such
"pseudo constants". And that'd probably be confusing, especially
because it's not exactly obvious when inlining happens.

Hm, interesting question. I think it might be all right without any
further hacking, because the parameters we care about substituting
would have been handled (or not) before inlining. But the interactions
would be ticklish, and surely worthy of a test case or three.

regards, tom lane

#14

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Tom Lane (#13)

Re: Making CASE error handling less surprising

Hi,

On 2020-07-23 16:56:44 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Hm. Would SQL function inlining be a problem? It looks like that just
substitutes parameters. Before calling
eval_const_expressions_mutator(). So we'd not know not to evaluate such
"pseudo constants". And that'd probably be confusing, especially
because it's not exactly obvious when inlining happens.

Hm, interesting question. I think it might be all right without any
further hacking, because the parameters we care about substituting
would have been handled (or not) before inlining. But the interactions
would be ticklish, and surely worthy of a test case or three.

I'm a bit worried about a case like:

SELECT foo(17);
CREATE FUNCTION yell(int, int)
RETURNS int
IMMUTABLE
LANGUAGE SQL AS $$
SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
$$;

EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

I don't think the parameters here would have been handled before
inlining, right?

Greetings,

Andres Freund

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Andres Freund (#14)

Re: Making CASE error handling less surprising

Andres Freund <andres@anarazel.de> writes:

I'm a bit worried about a case like:

CREATE FUNCTION yell(int, int)
RETURNS int
IMMUTABLE
LANGUAGE SQL AS $$
SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
$$;

EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

I don't think the parameters here would have been handled before
inlining, right?

Ah, I see what you mean. Yeah, that throws an error today, and it
still would with the patch I was envisioning (attached), because
inlining does Param substitution in a different way. I'm not
sure that we could realistically fix the inlining case with this
sort of approach.

I think this bears out the comment I made before that this approach
still leaves us with a very complicated behavior. Maybe we should
stick with the previous approach, possibly supplemented with a
leakproofness exception.

regards, tom lane

#16

Chris Travers

chris.travers@adjust.com

almost 6 years ago

In reply to: Tom Lane (#15)

Re: Making CASE error handling less surprising

On Fri, Jul 24, 2020 at 4:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I'm a bit worried about a case like:

CREATE FUNCTION yell(int, int)
RETURNS int
IMMUTABLE
LANGUAGE SQL AS $$
SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
$$;

EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);

I don't think the parameters here would have been handled before
inlining, right?

Ah, I see what you mean. Yeah, that throws an error today, and it
still would with the patch I was envisioning (attached), because
inlining does Param substitution in a different way. I'm not
sure that we could realistically fix the inlining case with this
sort of approach.

I think this bears out the comment I made before that this approach
still leaves us with a very complicated behavior. Maybe we should
stick with the previous approach, possibly supplemented with a
leakproofness exception.

I am actually not so sure this is a good idea. Here are two doubts I have.

1. The problem of when a given SQL expression is evaluated crops up in a
wide variety of different contexts and, worst case, causes far more damage
than queries which always error. Removing the lower hanging fruit while
leaving cases like:

select lock_foo(id), * from foo where somefield > 100; -- which rows does
lock_foo(id) run on? Does it matter?

is going to legitimize these complaints in a way which will be very hard to
do unless we also want to eventually be able to specify when volatile
functions may be run. The two cases don't look the same but they are
manifestations of the same problem which is that when you execute a SQL
query you have no control over when expressions are actually run.

2. The refusal to fold immutables within case statements here mean either
we do more tricks to get around the planner if we hit a pathological cases
in performance. I am not convinced this is a net win.

If we go this route, would it be too much to ask to allow a GUC variable to
preserve the old behavior?

regards, tom lane

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Tom Lane (#15)

Re: Making CASE error handling less surprising

I wrote:

Ah, I see what you mean. Yeah, that throws an error today, and it
still would with the patch I was envisioning (attached), because
inlining does Param substitution in a different way. I'm not
sure that we could realistically fix the inlining case with this
sort of approach.

Here's another example that we can't possibly fix with Param substitution
hacking, because there are no Params involved in the first place:

select f1, case when f1 = 42 then 1/i else null end
from (select f1, 0 as i from int4_tbl) ss;

Pulling up the subquery results in "1/0", so this fails today,
even though "f1 = 42" is never true.

Attached is a v3 patch that incorporates the leakproofness idea.
As shown in the new case.sql tests, this does fix both the SQL
function and subquery-pullup cases.

To keep the join regression test results the same, I marked int48()
as leakproof, which is surely safe enough. Probably we should make
a push to mark all unconditionally-safe implicit coercions as
leakproof, but that's a separate matter.

regards, tom lane

#18

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Tom Lane (#1)

Re: Making CASE error handling less surprising

On Thu, Jul 23, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Every so often we get a complaint like [1] about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at run-time.

Yes, I've heard such complaints from other sources as well.

It struck me that it would not be hard to improve this situation a great
deal. If, within a CASE subexpression that isn't certain to be executed
at runtime, we refuse to pre-evaluate *any* function (essentially, treat
them all as volatile), then we should largely get the semantics that
users expect. There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
believe this. Pavel's example is a good one. The leakproof exception
helps, but it doesn't cover everything. Users I've encountered throw
things like date_trunc() and lpad() into SQL code and expect them to
behave (from a performance point of view) like constants, but they
also expect 1/0 not to get evaluated too early when e.g. CASE is used.
It's difficult to meet both sets of expectations at the same time and
we're probably never going to have a perfect solution, but I think
you're minimizing the concern too much here.

This is not a complete fix, because if you write a sub-SELECT the
contents of the sub-SELECT are not processed by the outer query's
eval_const_expressions pass; instead, we look at it within the
sub-SELECT itself, and in that context there's no apparent reason
to avoid const-folding. So
CASE WHEN x < 0 THEN (SELECT 1/0) END
fails even if x is never less than zero. I don't see any great way
to avoid that, and I'm not particularly concerned about it anyhow;
usually the point of a sub-SELECT like this is to be decoupled from
outer query evaluation, so that the behavior should not be that
surprising.

I don't think I believe this either. I don't think an average user is
going to expect <expression> to behave differently from (SELECT
<expression>). This one actually bothers me more than the previous
one. How would we even document it? Sometimes things get inlined,
sometimes they don't. Sometimes subqueries get pulled up, sometimes
not. The current behavior isn't great, but at least it handles these
cases consistently. Getting the easy cases "right" while making the
behavior in more complex cases harder to understand is not necessarily
a win.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Robert Haas (#18)

Re: Making CASE error handling less surprising

Hi,

On 2020-07-24 12:31:05 -0400, Robert Haas wrote:

On Thu, Jul 23, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Every so often we get a complaint like [1] about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at run-time.

Yes, I've heard such complaints from other sources as well.

It struck me that it would not be hard to improve this situation a great
deal. If, within a CASE subexpression that isn't certain to be executed
at runtime, we refuse to pre-evaluate *any* function (essentially, treat
them all as volatile), then we should largely get the semantics that
users expect. There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
believe this. Pavel's example is a good one. The leakproof exception
helps, but it doesn't cover everything. Users I've encountered throw
things like date_trunc() and lpad() into SQL code and expect them to
behave (from a performance point of view) like constants, but they
also expect 1/0 not to get evaluated too early when e.g. CASE is used.
It's difficult to meet both sets of expectations at the same time and
we're probably never going to have a perfect solution, but I think
you're minimizing the concern too much here.

Wouldn't the rule that I proposed earlier, namely that sub-expressions
that involve only "proper" constants continue to get evaluated even
within CASE, largely address that?

I don't think I believe this either. I don't think an average user is
going to expect <expression> to behave differently from (SELECT
<expression>). This one actually bothers me more than the previous
one. How would we even document it? Sometimes things get inlined,
sometimes they don't. Sometimes subqueries get pulled up, sometimes
not. The current behavior isn't great, but at least it handles these
cases consistently. Getting the easy cases "right" while making the
behavior in more complex cases harder to understand is not necessarily
a win.

Well, if we formalize the desired behaviour it's probably a lot easier
to work towards implementing it in additional cases (like
subselects). It doesn't seem to hard to keep track of whether a specific
subquery can be evaluate constants in a certain way, if that's what we
need.

Greetings,

Andres Freund

#20

Pavel Stehule

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Andres Freund (#19)

Re: Making CASE error handling less surprising

pá 24. 7. 2020 v 18:49 odesílatel Andres Freund <andres@anarazel.de> napsal:

Hi,

On 2020-07-24 12:31:05 -0400, Robert Haas wrote:

On Thu, Jul 23, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Every so often we get a complaint like [1] about how a CASE should have
prevented a run-time error and didn't, because constant-folding tried
to evaluate a subexpression that would not have been entered at

run-time.

Yes, I've heard such complaints from other sources as well.

It struck me that it would not be hard to improve this situation a

great

deal. If, within a CASE subexpression that isn't certain to be

executed

at runtime, we refuse to pre-evaluate *any* function (essentially,

treat

them all as volatile), then we should largely get the semantics that
users expect. There's some potential for query slowdown if a CASE
contains a constant subexpression that we formerly reduced at plan time
and now do not, but that doesn't seem to me to be a very big deal.

Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
believe this. Pavel's example is a good one. The leakproof exception
helps, but it doesn't cover everything. Users I've encountered throw
things like date_trunc() and lpad() into SQL code and expect them to
behave (from a performance point of view) like constants, but they
also expect 1/0 not to get evaluated too early when e.g. CASE is used.
It's difficult to meet both sets of expectations at the same time and
we're probably never going to have a perfect solution, but I think
you're minimizing the concern too much here.

Wouldn't the rule that I proposed earlier, namely that sub-expressions
that involve only "proper" constants continue to get evaluated even
within CASE, largely address that?

It doesn't solve a possible performance problem with one shot (EXECUTE stmt
plpgsql) queries, or with parameterized queries

Show quoted text

I don't think I believe this either. I don't think an average user is
going to expect <expression> to behave differently from (SELECT
<expression>). This one actually bothers me more than the previous
one. How would we even document it? Sometimes things get inlined,
sometimes they don't. Sometimes subqueries get pulled up, sometimes
not. The current behavior isn't great, but at least it handles these
cases consistently. Getting the easy cases "right" while making the
behavior in more complex cases harder to understand is not necessarily
a win.

Well, if we formalize the desired behaviour it's probably a lot easier
to work towards implementing it in additional cases (like
subselects). It doesn't seem to hard to keep track of whether a specific
subquery can be evaluate constants in a certain way, if that's what we
need.

Greetings,

Andres Freund