performance issue: logical operators are slow inside SQL function: missing optimalization?

Started by Pavel Stehuleover 15 years ago10 messages

pavel.stehule@gmail.com

over 15 years ago

Hello

I testing a simple use case and I was surprised with very slow
execution of SQL functions

create or replace function empty_string1(text)
returns bool as $$
select $1 is NULL or $1 = ''
$$ language sql;

postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)

Time: 448.616 ms

little bit updated function is much faster

create or replace function empty_string2(text)
returns bool as $$
select coalesce($1,'') = ''
$$ language sql;

postgres=# select count(empty_string2(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)

Time: 64.437 ms

just null test function is fast too (or just empty str function)

postgres=# create or replace function empty_string1(text) returns bool
as $$select $1 is NULL $$ language sql;
CREATE FUNCTION
Time: 21.929 ms
postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)

Time: 48.554 ms

Is strange - so slow function can be replaced by plpgsql function and
it's faster

postgres=# create or replace function empty_string1(text) returns bool
as $$begin return $1 is null or $1 = ''; end$$ language plpgsql
immutable;
CREATE FUNCTION
Time: 70.359 ms
postgres=# select count(empty_string1(CASE WHEN random() < 0.5 THEN
NULL ELSE 'x' END)) FROM generate_series(1,100000);
count
--------
100000
(1 row)

Time: 220.131 ms

Tested on 9.1 without assertions

Regards

Pavel Stehule

marko.tiikkaja@cs.helsinki.fi

over 15 years ago

In reply to: Pavel Stehule (#1)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Is strange - so slow function can be replaced by plpgsql function and
it's faster

All your SQL language functions were VOLATILE.

Regards,
Marko Tiikkaja

pavel.stehule@gmail.com

over 15 years ago

In reply to: Marko Tiikkaja (#2)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

Hello

2010/8/29 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>:

On 29 Aug 2010, at 13:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Is strange - so slow function can be replaced by plpgsql function and
it's faster

All your SQL language functions were VOLATILE.

It's not a problem - planner see inside SQL function - so you don't
need set a flags.

Regards

Pavel Stehule

Show quoted text

Regards,
Marko Tiikkaja

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#1)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

Pavel Stehule <pavel.stehule@gmail.com> writes:

I testing a simple use case and I was surprised with very slow
execution of SQL functions

The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.

regards, tom lane

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#4)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I testing a simple use case and I was surprised with very slow
execution of SQL functions

The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.

I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?

Regards

Pavel Stehule

Show quoted text

regards, tom lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Pavel Stehule (#5)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:

The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.

I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?

It has nothing to do with boolean operators, just double evaluation.

regards, tom lane

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#6)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:

The one case is inline-able and the other not (because it would result
in double evaluation of the volatile function random()).
See EXPLAIN VERBOSE.

I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?

It has nothing to do with boolean operators, just double evaluation.

sure. I was blind. I have a question. It is possible do following
optimalisation?

I can write a function

CREATE OR REPLACE FUNCTION estring(text)
RETURNS bool AS $$
SELECT x IS NULL || x = ''
FROM (VALUES($1)) g(x)
$$ LANGUAGE sql;

Now this function isn't inlined, because optimaliser doesn't know a
VALUES clause. But with this knowleade, this can be a protection
before double evaluation. Or different way - generate_subplan with
parameters - it is still faster, than plpgsql or not inlined sql.

p.s. this query is badly planed

postgres=# select sum((select x is null or x = '' from (values(CASE
WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int) FROM
generate_series(1,100000);
sum
--------
100000
(1 row)

for corect behave a had to append a second variable
postgres=# select sum((select x is null or x = '' and i = i from
(values(CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END)) g(x) )::int)
FROM generate_series(1,100000) x(i);
sum
-------
50036
(1 row)

Regards

Pavel Stehule

Show quoted text

regards, tom lane

Martijn van Oosterhout

kleptog@svana.org

over 15 years ago

In reply to: Tom Lane (#6)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I understand now. So it means general advice - don't use a boolean
operators in SQL function? This issue should be documented somewhere?

It has nothing to do with boolean operators, just double evaluation.

I was wondering, wouldn't it be possible to avoid the double evaluation
by simply creating an extra slot for the intermediate value. So you
get:

$1 = CASE WHEN random() < 0.5 THEN NULL ELSE 'x' END
$2 = $1 IS NULL or $1 = ''

Sort of the way WITH works, but then for parts of expressions.

I don't believe currently expressions can refer to Vars at the same
level (it would make projections somewhat messy) but if you could
fix that you could avoid the double evaluation and still have decent
performance, right?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patriotism is when love of your own people comes first; nationalism,
when hate for people other than your own comes first.
- Charles de Gaulle

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Martijn van Oosterhout (#8)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

Martijn van Oosterhout <kleptog@svana.org> writes:

On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:

It has nothing to do with boolean operators, just double evaluation.

I was wondering, wouldn't it be possible to avoid the double evaluation
by simply creating an extra slot for the intermediate value.

Possibly, but the trick would be to figure out when to evaluate the
values so that it would still behave the same as without inlining.
I don't think the existing Param mechanism could do this without
some additional help.

regards, tom lane

pavel.stehule@gmail.com

over 15 years ago

In reply to: Tom Lane (#9)

Re: performance issue: logical operators are slow inside SQL function: missing optimalization?

2010/8/29 Tom Lane <tgl@sss.pgh.pa.us>:

Martijn van Oosterhout <kleptog@svana.org> writes:

On Sun, Aug 29, 2010 at 11:23:29AM -0400, Tom Lane wrote:

It has nothing to do with boolean operators, just double evaluation.

I was wondering, wouldn't it be possible to avoid the double evaluation
by simply creating an extra slot for the intermediate value.

Possibly, but the trick would be to figure out when to evaluate the
values so that it would still behave the same as without inlining.
I don't think the existing Param mechanism could do this without
some additional help.

maybe subject for ToDo?

Regards

Pavel Stehule

Show quoted text

regards, tom lane