Possible bug in CASE evaluation
I want to draw attention to this thread on -general:
CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g@mail.gmail.com
Would you concur that this is a bug?
The fine manual says about CASE:
If the condition's result is true, the value of the CASE expression
is the result that follows the condition, and the remainder of the
CASE expression is not processed.
But:
test=> CREATE FUNCTION zero() RETURNS integer IMMUTABLE LANGUAGE SQL AS 'SELECT 0';
CREATE FUNCTION
test=> SELECT CASE WHEN (SELECT zero()) = 0 THEN -1 ELSE 1/zero() END;
ERROR: division by zero
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2013-06-21 08:16:22 +0000, Albe Laurenz wrote:
I want to draw attention to this thread on -general:
CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g@mail.gmail.com
There's also a bug reported for it:
#8237: E1Uovmc-0007FT-R4@wrigleys.postgresql.org
Would you concur that this is a bug?
Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund wrote:
On 2013-06-21 08:16:22 +0000, Albe Laurenz wrote:
I want to draw attention to this thread on -general:
CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g@mail.gmail.comThere's also a bug reported for it:
#8237: E1Uovmc-0007FT-R4@wrigleys.postgresql.orgWould you concur that this is a bug?
Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...
I think that it is surprising behaviour.
If fixing the behaviour is undesirable, at least the documentation
should be fixed.
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/6/21 Andres Freund <andres@2ndquadrant.com>:
Hi,
On 2013-06-21 08:16:22 +0000, Albe Laurenz wrote:
I want to draw attention to this thread on -general:
CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g@mail.gmail.comThere's also a bug reported for it:
#8237: E1Uovmc-0007FT-R4@wrigleys.postgresql.orgWould you concur that this is a bug?
Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...
fix is not easy :-( you should to catch any possible exception, so it
means, so you should to do some optimalization under subtransactions -
or you should not do this kind of optimizations.
Pavel
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:
Andres Freund wrote:
Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...
If you refer to this:
On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
So it seems we need to stop processing after finding a single WHEN
that's not const? Does anybody have a better idea?
eval_const_expressions() is not just an optimization: it performs mandatory
transformations such as the conversion of named-argument function calls to
positional function calls. Even if you could skip it, queries with expensive
constant expressions would notice the performance loss. The situations helped
by a change like this are too marginal to accept that cost.
Would it work to run eval_const_expressions() lazily on THEN clauses? The
plan-time eval_const_expressions() would not descend to them. The first
ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
before proceeding.
I think that it is surprising behaviour.
That's about how I characterize it, too.
I question whether real applications care. It's important to have CASE usable
for avoiding data-dependent errors, but what's the use of including in your
query an expression that can do nothing but throw an error? Does anyone have
a real-world example? Perhaps some generated-query scenario.
That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.
If fixing the behaviour is undesirable, at least the documentation
should be fixed.
A brief documentation mention sounds fine. Perhaps add a paragraph on
constant folding in general and reference that from the CASE page.
Thanks,
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:
Andres Freund wrote:
Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...If you refer to this:
On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
So it seems we need to stop processing after finding a single WHEN
that's not const? Does anybody have a better idea?eval_const_expressions() is not just an optimization: it performs mandatory
transformations such as the conversion of named-argument function calls to
positional function calls.
Ah yes. Forgot about that... Scrap that. Although it surely isn't nice
that all that is done in a function calleed eval_const_expressions()...
Even if you could skip it, queries with expensive
constant expressions would notice the performance loss. The situations helped
by a change like this are too marginal to accept that cost.
I have to say, that argument doesn't excite me mu8ch. It's not like we
don't want to do the constant expression evaluation at all anymore. Just
not inside CASE WHEN blocks which already are barring some optimizations
anyway...
Would it work to run eval_const_expressions() lazily on THEN clauses? The
plan-time eval_const_expressions() would not descend to them. The first
ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
before proceeding.
Ugh. Doesn't sound nice. I don't have any better ideas than to actually
split eval_const_expressions into one function that does the necessary
things like canonicalization of AND/OR and one that actually evaluates
expressions inside though.
So maybe that's the way to go :/
I think that it is surprising behaviour.
That's about how I characterize it, too.
I question whether real applications care. It's important to have CASE usable
for avoiding data-dependent errors, but what's the use of including in your
query an expression that can do nothing but throw an error? Does anyone have
a real-world example? Perhaps some generated-query scenario.
It doesn't need to be an actual constant. Something that evaluates to
the value at plan time is enough:
PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
EXECUTE foo(0);
That example will most likely only crashes in 9.2+ because it will
replan it with the acutal parameter values in place. But you could have
the same in earlier versions e.g. using PQExecParams(), but that's
harder to demonstrate.
Now, that example only crashes because one place uses (SELECT $1) and
the other doesn't, but...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.
What about passing another parameter down eval_const_expressions_mutator
(which is static, so changing the API isn't a problem) that basically
tells us whether we actually should evaluate expressions or just perform
the transformation?
There's seems to be basically a couple of places where we call dangerous
things:
* simplify_function (via ->evaluate_function->evaluate_expr) which is
called in a bunch of places
* evaluate_expr which is directly called in T_ArrayExpr
T_ArrayCoerceExpr
All places I've inspected so far need to deal with simplify_function
returning NULL anyway, so that seems like a viable fix.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/6/21 Andres Freund <andres@2ndquadrant.com>:
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:
Andres Freund wrote:
Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...If you refer to this:
On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
So it seems we need to stop processing after finding a single WHEN
that's not const? Does anybody have a better idea?eval_const_expressions() is not just an optimization: it performs mandatory
transformations such as the conversion of named-argument function calls to
positional function calls.Ah yes. Forgot about that... Scrap that. Although it surely isn't nice
that all that is done in a function calleed eval_const_expressions()...Even if you could skip it, queries with expensive
constant expressions would notice the performance loss. The situations helped
by a change like this are too marginal to accept that cost.
yes, I dislike it too - then we can have inconsistent behave of
constant between CASE and other statements.
We should to do without any performance lost, if we do some changes in
this area.
Regards
Pavel
I have to say, that argument doesn't excite me mu8ch. It's not like we
don't want to do the constant expression evaluation at all anymore. Just
not inside CASE WHEN blocks which already are barring some optimizations
anyway...Would it work to run eval_const_expressions() lazily on THEN clauses? The
plan-time eval_const_expressions() would not descend to them. The first
ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
before proceeding.Ugh. Doesn't sound nice. I don't have any better ideas than to actually
split eval_const_expressions into one function that does the necessary
things like canonicalization of AND/OR and one that actually evaluates
expressions inside though.
So maybe that's the way to go :/I think that it is surprising behaviour.
That's about how I characterize it, too.
I question whether real applications care. It's important to have CASE usable
for avoiding data-dependent errors, but what's the use of including in your
query an expression that can do nothing but throw an error? Does anyone have
a real-world example? Perhaps some generated-query scenario.It doesn't need to be an actual constant. Something that evaluates to
the value at plan time is enough:
PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
EXECUTE foo(0);That example will most likely only crashes in 9.2+ because it will
replan it with the acutal parameter values in place. But you could have
the same in earlier versions e.g. using PQExecParams(), but that's
harder to demonstrate.Now, that example only crashes because one place uses (SELECT $1) and
the other doesn't, but...Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 21, 2013 at 04:12:32PM +0200, Andres Freund wrote:
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:
Even if you could skip it, queries with expensive
constant expressions would notice the performance loss. The situations helped
by a change like this are too marginal to accept that cost.I have to say, that argument doesn't excite me mu8ch. It's not like we
don't want to do the constant expression evaluation at all anymore. Just
not inside CASE WHEN blocks which already are barring some optimizations
anyway...
Sure, it's a narrow loss. Before introducing a new narrow loss to fix an
existing one, we should consider which loss hurts more. Offhand, I sympathize
with the folks who would lose performance more than with the folks who want to
write the sorts of expressions under consideration.
Would it work to run eval_const_expressions() lazily on THEN clauses? The
plan-time eval_const_expressions() would not descend to them. The first
ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
before proceeding.Ugh. Doesn't sound nice.
Would you elaborate?
I question whether real applications care. It's important to have CASE usable
for avoiding data-dependent errors, but what's the use of including in your
query an expression that can do nothing but throw an error? Does anyone have
a real-world example? Perhaps some generated-query scenario.It doesn't need to be an actual constant. Something that evaluates to
the value at plan time is enough:
PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
EXECUTE foo(0);
Now, that example only crashes because one place uses (SELECT $1) and
the other doesn't, but...
Not the "real-world" I was hoping for, but fair enough.
"Crash" in this context means "raise an error", right?
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.What about passing another parameter down eval_const_expressions_mutator
(which is static, so changing the API isn't a problem) that basically
tells us whether we actually should evaluate expressions or just perform
the transformation?
There's seems to be basically a couple of places where we call dangerous
things:
* simplify_function (via ->evaluate_function->evaluate_expr) which is
called in a bunch of places
* evaluate_expr which is directly called in T_ArrayExpr
T_ArrayCoerceExprAll places I've inspected so far need to deal with simplify_function
returning NULL anyway, so that seems like a viable fix.
*Prototype* patch - that seems simple enough - attached. Opinions?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Don-t-evaluate-potentially-unreachable-parts-of-CASE.patchtext/x-patch; charset=us-asciiDownload
>From 1349f8e4a8133eebbf66753b1aa3787f88ee3e33 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 22 Jun 2013 16:53:39 +0200
Subject: [PATCH] Don't evaluate potentially unreachable parts of CASE during
planning
---
src/backend/optimizer/util/clauses.c | 31 +++++++++++++++++++++++++++++--
src/test/regress/expected/case.out | 13 ++++++++++---
src/test/regress/sql/case.sql | 4 ++--
3 files changed, 41 insertions(+), 7 deletions(-)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6d5b204..94b52f6 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -63,6 +63,7 @@ typedef struct
List *active_fns;
Node *case_val;
bool estimate;
+ bool doevil;
} eval_const_expressions_context;
typedef struct
@@ -2202,6 +2203,7 @@ eval_const_expressions(PlannerInfo *root, Node *node)
context.active_fns = NIL; /* nothing being recursively simplified */
context.case_val = NULL; /* no CASE being examined */
context.estimate = false; /* safe transformations only */
+ context.doevil = true; /* allowed to evaluate const expressions */
return eval_const_expressions_mutator(node, &context);
}
@@ -2233,6 +2235,7 @@ estimate_expression_value(PlannerInfo *root, Node *node)
context.active_fns = NIL; /* nothing being recursively simplified */
context.case_val = NULL; /* no CASE being examined */
context.estimate = true; /* unsafe transformations OK */
+ context.doevil = true; /* allowed to evaluate const expressions */
return eval_const_expressions_mutator(node, &context);
}
@@ -2751,7 +2754,7 @@ eval_const_expressions_mutator(Node *node,
* If constant argument and it's a binary-coercible or
* immutable conversion, we can simplify it to a constant.
*/
- if (arg && IsA(arg, Const) &&
+ if (context->doevil && arg && IsA(arg, Const) &&
(!OidIsValid(newexpr->elemfuncid) ||
func_volatile(newexpr->elemfuncid) == PROVOLATILE_IMMUTABLE))
return (Node *) evaluate_expr((Expr *) newexpr,
@@ -2843,16 +2846,20 @@ eval_const_expressions_mutator(Node *node,
CaseExpr *caseexpr = (CaseExpr *) node;
CaseExpr *newcase;
Node *save_case_val;
+ bool save_doevil;
Node *newarg;
List *newargs;
bool const_true_cond;
Node *defresult = NULL;
ListCell *arg;
+ save_doevil = context->doevil;
+
/* Simplify the test expression, if any */
newarg = eval_const_expressions_mutator((Node *) caseexpr->arg,
context);
+
/* Set up for contained CaseTestExpr nodes */
save_case_val = context->case_val;
if (newarg && IsA(newarg, Const))
@@ -2894,6 +2901,17 @@ eval_const_expressions_mutator(Node *node,
const_true_cond = true;
}
+ /*
+ * We can only evaluate expressions as long we are sure the
+ * the expression will be reached at runtime - otherwise we
+ * might cause spurious errors to be thrown. The first
+ * eval_const_expression above can always evaluate
+ * expressions (unless we are in a nested CaseExpr) since
+ * it will always be reached at runtime.
+ */
+ if (!const_true_cond)
+ context->doevil = false;
+
/* Simplify this alternative's result value */
caseresult = eval_const_expressions_mutator((Node *) oldcasewhen->result,
context);
@@ -2926,6 +2944,8 @@ eval_const_expressions_mutator(Node *node,
context->case_val = save_case_val;
+ context->doevil = save_doevil;
+
/*
* If no non-FALSE alternatives, CASE reduces to the default
* result
@@ -2982,7 +3002,7 @@ eval_const_expressions_mutator(Node *node,
newarray->multidims = arrayexpr->multidims;
newarray->location = arrayexpr->location;
- if (all_const)
+ if (all_const && context->doevil)
return (Node *) evaluate_expr((Expr *) newarray,
newarray->array_typeid,
exprTypmod(node),
@@ -3940,6 +3960,13 @@ evaluate_function(Oid funcid, Oid result_type, int32 result_typmod,
result_collid);
/*
+ * don't evaluate if we aren't allowed to - only check here because it's
+ * legal to optimize cases like the above strict optimization.
+ */
+ if (!context->doevil)
+ return NULL;
+
+ /*
* Otherwise, can simplify only if all inputs are constants. (For a
* non-strict function, constant NULL inputs are treated the same as
* constant non-NULL inputs.)
diff --git a/src/test/regress/expected/case.out b/src/test/regress/expected/case.out
index c564eed..84046a8 100644
--- a/src/test/regress/expected/case.out
+++ b/src/test/regress/expected/case.out
@@ -85,10 +85,17 @@ SELECT CASE 1 WHEN 0 THEN 1/0 WHEN 1 THEN 1 ELSE 2/0 END;
1
(1 row)
--- However we do not currently suppress folding of potentially
--- reachable subexpressions
+-- Constant-expression folding shouldn't evaluate potentially reachable
+-- subexpressions
SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl;
-ERROR: division by zero
+ case
+------
+ 0
+ 0
+ 0
+ 0
+(4 rows)
+
-- Test for cases involving untyped literals in test expression
SELECT CASE 'a' WHEN 'a' THEN 1 ELSE 2 END;
case
diff --git a/src/test/regress/sql/case.sql b/src/test/regress/sql/case.sql
index 5f41753..179c3cb 100644
--- a/src/test/regress/sql/case.sql
+++ b/src/test/regress/sql/case.sql
@@ -62,8 +62,8 @@ SELECT '6' AS "One",
SELECT CASE WHEN 1=0 THEN 1/0 WHEN 1=1 THEN 1 ELSE 2/0 END;
SELECT CASE 1 WHEN 0 THEN 1/0 WHEN 1 THEN 1 ELSE 2/0 END;
--- However we do not currently suppress folding of potentially
--- reachable subexpressions
+-- Constant-expression folding shouldn't evaluate potentially reachable
+-- subexpressions
SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl;
-- Test for cases involving untyped literals in test expression
--
1.8.3.1
Noah Misch wrote:
If fixing the behaviour is undesirable, at least the documentation
should be fixed.A brief documentation mention sounds fine. Perhaps add a paragraph on
constant folding in general and reference that from the CASE page.
How about the attached?
Yours,
Laurenz Albe
Attachments:
case-doc.patchapplication/octet-stream; name=case-doc.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 4c5af4b..8cf5385
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*************** SELECT a,
*** 10563,10568 ****
--- 10563,10578 ----
SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false END;
</programlisting>
</para>
+
+ <note>
+ <para>
+ As described in <xref linkend="xfunc-volatility">, functions and
+ operators marked <literal>IMMUTABLE</literal> can be evaluated when
+ the query is planned rather than when it is executed. This means
+ that constant parts of a subexpression that is not evaluated during
+ query execution might still be evaluated during query planning.
+ </para>
+ </note>
</sect2>
<sect2 id="functions-coalesce-nvl-ifnull">
On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.What about passing another parameter down eval_const_expressions_mutator
(which is static, so changing the API isn't a problem) that basically
tells us whether we actually should evaluate expressions or just perform
the transformation?
There's seems to be basically a couple of places where we call dangerous
things:
* simplify_function (via ->evaluate_function->evaluate_expr) which is
called in a bunch of places
* evaluate_expr which is directly called in T_ArrayExpr
T_ArrayCoerceExprAll places I've inspected so far need to deal with simplify_function
returning NULL anyway, so that seems like a viable fix.*Prototype* patch - that seems simple enough - attached. Opinions?
Simple enough, yes. The other point still stands.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.What about passing another parameter down eval_const_expressions_mutator
(which is static, so changing the API isn't a problem) that basically
tells us whether we actually should evaluate expressions or just perform
the transformation?
There's seems to be basically a couple of places where we call dangerous
things:
* simplify_function (via ->evaluate_function->evaluate_expr) which is
called in a bunch of places
* evaluate_expr which is directly called in T_ArrayExpr
T_ArrayCoerceExprAll places I've inspected so far need to deal with simplify_function
returning NULL anyway, so that seems like a viable fix.*Prototype* patch - that seems simple enough - attached. Opinions?
Simple enough, yes. The other point still stands.
You mean performance? Primarily I still think we should first worry
about correctness first and then about performance. And CASE is the
documented (and really only, without writing procedual code) solution to
use for the cases where evaluation order actually *is* important.
But anyway, the question is to find realistic cases to measure the
performance of. Obviously you can just make arbitrarily expensive
expressions that can be computed full during constant folding. Which I
don't find very interesting, do you?
So, what I've done is to measure the performance difference when doing
full table queries of some CASE containing system views.
best of 5 everytime:
SELECT * FROM pg_stats;
master: 28.287 patched: 28.565
SELECT * FROM information_schema.applicable_roles;
master: 0.757 patched: 0.755
regression=# SELECT * FROM information_schema.attributes:
master: 8.392 patched: 8.555
SELECT * FROM information_schema.column_privileges;
master: 90.853 patched: 88.551
SELECT * FROM information_schema.columns;
master: 259.436 patched: 274.145
SELECT * FROM information_schema.constraint_column_usage ;
master: 14.736 patched 15.005
SELECT * FROM information_schema.parameters;
master: 76.173 patched: 79.850
SELECT * FROM information_schema.routines;
master: 45.102 patched: 46.517 ms
...
So, on those queries there's some difference (I've left out the ones
which are too short), but it's not big.
Now, for the other extreme, the following completely random query I just
typed out:
SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
master: 491.931 patched: 943.629
suffers way much worse because the division is so expensive...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/6/25 Andres Freund <andres@2ndquadrant.com>:
On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.What about passing another parameter down eval_const_expressions_mutator
(which is static, so changing the API isn't a problem) that basically
tells us whether we actually should evaluate expressions or just perform
the transformation?
There's seems to be basically a couple of places where we call dangerous
things:
* simplify_function (via ->evaluate_function->evaluate_expr) which is
called in a bunch of places
* evaluate_expr which is directly called in T_ArrayExpr
T_ArrayCoerceExprAll places I've inspected so far need to deal with simplify_function
returning NULL anyway, so that seems like a viable fix.*Prototype* patch - that seems simple enough - attached. Opinions?
Simple enough, yes. The other point still stands.
You mean performance? Primarily I still think we should first worry
about correctness first and then about performance. And CASE is the
documented (and really only, without writing procedual code) solution to
use for the cases where evaluation order actually *is* important.But anyway, the question is to find realistic cases to measure the
performance of. Obviously you can just make arbitrarily expensive
expressions that can be computed full during constant folding. Which I
don't find very interesting, do you?So, what I've done is to measure the performance difference when doing
full table queries of some CASE containing system views.best of 5 everytime:
SELECT * FROM pg_stats;
master: 28.287 patched: 28.565SELECT * FROM information_schema.applicable_roles;
master: 0.757 patched: 0.755regression=# SELECT * FROM information_schema.attributes:
master: 8.392 patched: 8.555SELECT * FROM information_schema.column_privileges;
master: 90.853 patched: 88.551SELECT * FROM information_schema.columns;
master: 259.436 patched: 274.145SELECT * FROM information_schema.constraint_column_usage ;
master: 14.736 patched 15.005SELECT * FROM information_schema.parameters;
master: 76.173 patched: 79.850SELECT * FROM information_schema.routines;
master: 45.102 patched: 46.517 ms...
So, on those queries there's some difference (I've left out the ones
which are too short), but it's not big.Now, for the other extreme, the following completely random query I just
typed out:
SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
master: 491.931 patched: 943.629suffers way much worse because the division is so expensive...
:-(
it is too high price
Pavel
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 25, 2013 at 03:01:52PM +0200, Andres Freund wrote:
On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
Simple enough, yes. The other point still stands.
You mean performance? Primarily I still think we should first worry
about correctness first and then about performance. And CASE is the
documented (and really only, without writing procedual code) solution to
use for the cases where evaluation order actually *is* important.
I largely share that sentiment, but it's tempered here by the incorrect
behavior's long tenure, the difficulty of encountering a problem without
constructing a test case for the purpose of doing so, the availability of
workarounds, and the open-ended negative performance implications of your
proposed correction.
But anyway, the question is to find realistic cases to measure the
performance of. Obviously you can just make arbitrarily expensive
expressions that can be computed full during constant folding. Which I
don't find very interesting, do you?
Now, for the other extreme, the following completely random query I just
typed out:
SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
master: 491.931 patched: 943.629suffers way much worse because the division is so expensive...
That's a clear indicator for this strategy being a dead end. It's not far
from that to a realistic use case; e.g. log(10,2)/g.i or g.i*(2.0/3).
I'm still interested in your answer to my first question here:
/messages/by-id/20130621150554.GC740984@tornado.leadboat.com
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Noah,
On 2013-06-25 19:05:15 -0400, Noah Misch wrote:
On Tue, Jun 25, 2013 at 03:01:52PM +0200, Andres Freund wrote:
On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
Simple enough, yes. The other point still stands.
You mean performance? Primarily I still think we should first worry
about correctness first and then about performance. And CASE is the
documented (and really only, without writing procedual code) solution to
use for the cases where evaluation order actually *is* important.I largely share that sentiment, but it's tempered here by the incorrect
behavior's long tenure, the difficulty of encountering a problem without
constructing a test case for the purpose of doing so, the availability of
workarounds, and the open-ended negative performance implications of your
proposed correction.
The workaround being that you need to know which version of postgres can
inline/evaluate which parts of a query?
But anyway, the question is to find realistic cases to measure the
performance of. Obviously you can just make arbitrarily expensive
expressions that can be computed full during constant folding. Which I
don't find very interesting, do you?Now, for the other extreme, the following completely random query I just
typed out:
SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
master: 491.931 patched: 943.629suffers way much worse because the division is so expensive...
That's a clear indicator for this strategy being a dead end. It's not far
from that to a realistic use case; e.g. log(10,2)/g.i or g.i*(2.0/3).
Sure. But in most realistic scenarios the actual overhead of the query
will be larger - the above example is that expensive because basically
nothing but those computation are happening.
I'm still interested in your answer to my first question here:
/messages/by-id/20130621150554.GC740984@tornado.leadboat.com
I guess you're referring to:
Would it work to run eval_const_expressions() lazily on THEN clauses? The
plan-time eval_const_expressions() would not descend to them. The first
ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
before proceeding.Ugh. Doesn't sound nice.
Would you elaborate?
Several things:
1) As you reminded me of, other parts of the planner rely on a good part
of the transformations in eval_const_expressions() having already been
performed. So we cannot simply not descend there.
So what we would have to do is to apply something akin to the proposed
and then remember for various nodes (at least the individual CaseWhen's
expr and result, potentially CaseTestExpr?) whether we already did
the full constant folding.
2) modifying the expression tree during execution seems unclean and a
layering violation and requires doing type lookups and all that during
runtime. That's rather subjective, I agree.
3) lists are cool and I don't remember the third thing I had in mind.
But I guess given the objections on performance the combined approach is
the way to go?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
But I guess given the objections on performance the combined approach is
the way to go?
I think the documentation approach is the way to go.
It was pointed out in the pgsql-general thread about this that a naive
user might expect that, say, syntax or datatype violations in a CASE arm
would not get raised if the CASE doesn't select that arm. We, who know
that all such errors are thrown in the parser, might say that's silly
--- but why? I think it's not that far from acknowledging that some
errors will be thrown pre-execution to acknowledging that errors
produced by constant-folding can be thrown pre-execution, even if
they're inside a CASE. Where is the bright line that says we can
complain about 42+'bogus' in that context but not about 1/0?
If there were realistic use-cases for this sort of thing then I'd have
more sympathy for contorting the planner code to support them, but I'm
not seeing that. The examples shown so far are all pretty artificial,
unless your intent is to throw an error when the CASE fails; and in that
case why not do it with a volatile function containing a RAISE? Not
only will the planner handle that as you want, but it lets you report
an actually-useful message.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 24, 2013 at 01:41:51PM +0000, Albe Laurenz wrote:
Noah Misch wrote:
If fixing the behaviour is undesirable, at least the documentation
should be fixed.A brief documentation mention sounds fine. Perhaps add a paragraph on
constant folding in general and reference that from the CASE page.How about the attached?
Works for me; committed. Thanks.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers