performance issue: logical operators are slow inside SQL function: missing optimalization?
Hello
I testing a simple use case and I was surprised with very slow
execution of SQL functions
create or replace function empty_string1(text)
returns bool as $$
select $1 is NULL or $1 = ''
$$ language sql;
postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)
Time: 448.616 ms
little bit updated function is much faster
create or replace function empty_string2(text)
returns bool as $$
select coalesce($1,'') = ''
$$ language sql;
postgres=# select count(empty_string2(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)
Time: 64.437 ms
just null test function is fast too (or just empty str function)
postgres=# create or replace function empty_string1(text) returns bool
as $$select $1 is NULL $$ language sql;
CREATE FUNCTION
Time: 21.929 ms
postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)
Time: 48.554 ms
Is strange - so slow function can be replaced by plpgsql function and
it's faster
postgres=# create or replace function empty_string1(text) returns bool
as $$begin return $1 is null or $1 = ''; end$$ language plpgsql
immutable;
CREATE FUNCTION
Time: 70.359 ms
postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)
Time: 220.131 ms
Tested on 9.1 without assertions
Regards
Pavel Stehule
On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Is strange - so slow function can be replaced by plpgsql function and
it's faster
All your SQL language functions were VOLATILE.
Regards,
Marko Tiikkaja
Hello
2010/8/29 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>:
On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Is strange - so slow function can be replaced by plpgsql function and
it's fasterAll your SQL language functions were VOLATILE.
It's not a problem - planner see inside SQL function - so you don't
need set a flags.
Regards
Pavel Stehule
Show quoted text
Regards,
Marko Tiikkaja
Pavel Stehule <pavel.stehule@gmail.com> writes:
I testing a simple use case and I was surprised with very slow
execution of SQL functions
The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.
regards, tom lane
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I testing a simple use case and I was surprised with very slow
execution of SQL functionsThe one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.
I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?
Regards
Pavel Stehule
Show quoted text
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.
I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?
It has nothing to do with boolean operators, just double evaluation.
regards, tom lane
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?It has nothing to do with boolean operators, just double evaluation.
sure. I was blind. I have a question. It is possible do following
optimalisation?
I can write a function
CREATE OR REPLACE FUNCTION estring(text)
RETURNS bool AS $$
SELECT x IS NULL || x = ''
FROM (VALUES($1)) g(x)
$$ LANGUAGE sql;
Now this function isn't inlined, because optimaliser doesn't know a
VALUES clause. But with this knowleade, this can be a protection
before double evaluation. Or different way - generate_subplan with
parameters - it is still faster, than plpgsql or not inlined sql.
p.s. this query is badly planed
postgres=# select sum((select x is null or x = '' from (values(CASE
WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int) FROM
generate_series(1,100000);
sum
--------
100000
(1 row)
for corect behave a had to append a second variable
postgres=# select sum((select x is null or x = '' and i = i from
(values(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int)
FROM generate_series(1,100000) x(i);
sum
-------
50036
(1 row)
Regards
Pavel Stehule
Show quoted text
regards, tom lane
On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?It has nothing to do with boolean operators, just double evaluation.
I was wondering, wouldn't it be possible to avoid the double evaluation
by simply creating an extra slot for the intermediate value. So you
get:
$1 = CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END
$2 = $1 IS NULL or $1 = ''
Sort of the way WITH works, but then for parts of expressions.
I don't believe currently expressions can refer to Vars at the same
level (it would make projections somewhat messy) but if you could
fix that you could avoid the double evaluation and still have decent
performance, right?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patriotism is when love of your own people comes first; nationalism,
when hate for people other than your own comes first.
- Charles de Gaulle
Martijn van Oosterhout <kleptog@svana.org> writes:
On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:
It has nothing to do with boolean operators, just double evaluation.
I was wondering, wouldn't it be possible to avoid the double evaluation
by simply creating an extra slot for the intermediate value.
Possibly, but the trick would be to figure out when to evaluate the
values so that it would still behave the same as without inlining.
I don't think the existing Param mechanism could do this without
some additional help.
regards, tom lane
2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:
Martijn van Oosterhout <kleptog@svana.org> writes:
On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:
It has nothing to do with boolean operators, just double evaluation.
I was wondering, wouldn't it be possible to avoid the double evaluation
by simply creating an extra slot for the intermediate value.Possibly, but the trick would be to figure out when to evaluate the
values so that it would still behave the same as without inlining.
I don't think the existing Param mechanism could do this without
some additional help.
maybe subject for ToDo?
Regards
Pavel Stehule
Show quoted text
regards, tom lane