Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE

Started by Dmitry Koterovalmost 14 years ago4 messagesgeneral

dmitry@koterov.ru

almost 14 years ago

Hello.

For example, I have 2 functions like these:

CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS ... AS
$body$
...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
$body$
LANGUAGE 'sql'
*STABLE*

and

CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS ... AS
$body$
DECLARE
res ...;
BEGIN
EXECUTE '...the same SELECT, ' ||
'but ' || quote_literal(a) || ' args are embedded, plus ' ||
'LIMIT ' || quote_literal($3)
INTO res;
RETURN res;
END;
$body$
LANGUAGE 'plpgsql'
*STABLE*

And then I call

EXPLAIN ANALYZE SELECT * FROM first(...);
EXPLAIN ANALYZE SELECT * FROM second(...);

Should these two queries be executed by the same time usage (i.e. does
PostgreSQL generate same plans for inner queries)?

I always thought that the answer is YES: if a function is STABLE and with
language=SQL, its SQL code is embedded into outer context after all
arguments are expanded into their values (so the plan is built after
argument expansion). But some days ago I detected a case when second()
works about 100 times faster than first(), and the cause is seems that the
planner does not see all of expanded arguments in first() (if I replace
arguments to constants in first(), especially in LIMIT clause, it begins to
work the same speed as second() does). Unfortunately EXPLAIN ANALYZE does
not go into functions and shows only overall time, so I have no real
information about what plan is actually used in first().

Could you please comment this case a bit?..

Laurenz Albe

laurenz.albe@cybertec.at

almost 14 years ago

In reply to: Dmitry Koterov (#1)

Re: Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE

Dmitry Koterov wrote:

For example, I have 2 functions like these:

CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS

... AS

$body$
...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
$body$
LANGUAGE 'sql'
STABLE

and

CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS

... AS

$body$
DECLARE
res ...;
BEGIN
EXECUTE '...the same SELECT, ' ||
'but ' || quote_literal(a) || ' args are embedded, plus ' ||
'LIMIT ' || quote_literal($3)
INTO res;
RETURN res;
END;
$body$
LANGUAGE 'plpgsql'
STABLE

And then I call

EXPLAIN ANALYZE SELECT * FROM first(...);
EXPLAIN ANALYZE SELECT * FROM second(...);

Should these two queries be executed by the same time usage (i.e. does

PostgreSQL generate same plans

for inner queries)?

I always thought that the answer is YES: if a function is STABLE and

with language=SQL, its SQL code

is embedded into outer context after all arguments are expanded into

their values (so the plan is

built after argument expansion). But some days ago I detected a case

when second() works about 100

times faster than first(), and the cause is seems that the planner

does not see all of expanded

arguments in first() (if I replace arguments to constants in first(),

especially in LIMIT clause, it

begins to work the same speed as second() does). Unfortunately EXPLAIN

ANALYZE does not go into

functions and shows only overall time, so I have no real information

about what plan is actually used

in first().

You can get EXPLAIN plans if you use the auto_explain contrib module
with auto_explain.log_nested_statements enabled.

As you suspect, the two functions work differently.
The SQL function will plan a parameterized statement (with $1 etc. in
it)
and execute that statement whenever it is called, while the PL/pgSQL
function will execute an SQL statement with all the constant literals
in it that gets planned and executed when you call the function.

The SQL function will create a statement that cannot benefit from
optimizations that work only for certain constant values (although
there will be improvements in 9.2 for that). On the down side,
the PL/pgSQL function will have to plan the query every time it is
executed, which does not come for free.

To illustrate that, an example:

I create a table "test" as follows:

CREATE TABLE test(id integer PRIMARY KEY, val text NOT NULL);
CREATE INDEX test_val_ind ON test(val);

Then I fill it with 1000 rows, 11 of which have val='test'
and ANALYZE the table.

CREATE OR REPLACE FUNCTION first(text, integer) RETURNS integer
STABLE STRICT LANGUAGE sql AS
'SELECT id FROM test WHERE val=$1 LIMIT $2';

CREATE OR REPLACE FUNCTION second(IN v text, IN l integer, OUT r
integer)
STABLE STRICT LANGUAGE plpgsql AS
$$BEGIN
EXECUTE 'SELECT id FROM test WHERE val=' || quote_literal(v)
|| ' LIMIT ' || CAST(l AS integer) INTO r;
END$$;

Then I execute "SELECT first('test', 1)" and "SELECT SECOND('test', 1)".
The respective plans are:

For the SQL function:

Query Text: SELECT id FROM test WHERE val=$1 LIMIT $2
Limit (cost=0.00..1.75 rows=50 width=4)
-> Seq Scan on test (cost=0.00..17.50 rows=500 width=4)
Filter: (val = $1)

For the PL/pgSQL function:

Query Text: SELECT id FROM test WHERE val='test' LIMIT 1
Limit (cost=0.00..0.95 rows=1 width=4)
-> Index Scan using test_val_ind on test (cost=0.00..10.46 rows=11
width=4)
Index Cond: (val = 'test'::text)

Yours,
Laurenz Albe

Dmitry Koterov

dmitry@koterov.ru

almost 14 years ago

In reply to: Laurenz Albe (#2)

Re: Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE

Albe, thanks for detailed explaination.

But it's quite strange that SQL+STABLE function does not recalculate the
plan each time it is called. Because when I use a bunch of SQL+STABLE
functions in e.g. a sub-select of a complex query, I see in the plan of
this complex queries that function calls are "expanded". It looks like
PostgreSQL uses SQL code defined in SQL+STABLE functions and merges
(injects) it into the main query instead of the function call.

E.g.:

==========================================================================
CREATE TABLE a(i INTEGER, t VARCHAR(5));
INSERT INTO a(i, t) SELECT s, '' FROM generate_series(1, 10000) s;
INSERT INTO a(i, t) SELECT s, 't' FROM generate_series(10001, 10002) s;
CREATE INDEX "a_i_idx" ON "public"."a" USING btree ("i");
CREATE INDEX "a_t_idx" ON "public"."a" USING btree ("t");
ANALYZE a;

CREATE OR REPLACE FUNCTION a_get_t(in_a a) RETURNS TEXT STABLE LANGUAGE sql
AS 'SELECT $1.t';

explain analyze
select * from a
where a_get_t(a) = 't';

QUERY PLAN
Index Scan using a_t_idx on a (cost=0.00..8.29 rows=2 width=5) (actual
time=0.041..0.043 rows=2 loops=1)
Index Cond: ((t)::text = 't'::text)
==========================================================================

You may see that a_get_t() SQL code was merged into the main query plan, so
the result is found without a seqscan with t='t' filtering, but the index
is used.

That was a very simple example, I use much more complex SQL+STABLE
functions in my code and I am practically sure that this SQL
extraction+injection is applied by PostgreSQL in other cases too (e.g.
sometimes planner initiates a hash join with tables which are referred
inside SQL+STABLE functions in sub-queries). If I replace STABLE with
VOLATILE in that complex cases, the effect disappears: no more SQL
extraction performed.

And more: assume we have a function f(x, y, z) VOLATILE with very complex
SQL inside and we call it like:

select * from f(1, 2, 3);

The query may took a long time (during not only the first call, but during
all others too). Then I just replace VOLATILE to STABLE for f(), and the
same query:

select * from f(1, 2, 3);

suddenly becomes very fast (e.g. 1000 times faster or even more). It's a
very common case: I've performed many times. I thought that it was because
of re-planning of STABLE functions on each call according to real passed
values...

If STABLE functions has frozen plans too (independent to its real passed
arguments values), how could we explain so much difference in performance
replacing VOLATILE to STABLE?

On Fri, Apr 27, 2012 at 2:34 PM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:

Show quoted text

Dmitry Koterov wrote:

For example, I have 2 functions like these:

CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS

... AS

$body$
...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
$body$
LANGUAGE 'sql'
STABLE

and

CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS

... AS

$body$
DECLARE
res ...;
BEGIN
EXECUTE '...the same SELECT, ' ||
'but ' || quote_literal(a) || ' args are embedded, plus ' ||
'LIMIT ' || quote_literal($3)
INTO res;
RETURN res;
END;
$body$
LANGUAGE 'plpgsql'
STABLE

And then I call

EXPLAIN ANALYZE SELECT * FROM first(...);
EXPLAIN ANALYZE SELECT * FROM second(...);

Should these two queries be executed by the same time usage (i.e. does

PostgreSQL generate same plans

for inner queries)?

I always thought that the answer is YES: if a function is STABLE and

with language=SQL, its SQL code

is embedded into outer context after all arguments are expanded into

their values (so the plan is

built after argument expansion). But some days ago I detected a case

when second() works about 100

times faster than first(), and the cause is seems that the planner

does not see all of expanded

arguments in first() (if I replace arguments to constants in first(),

especially in LIMIT clause, it

begins to work the same speed as second() does). Unfortunately EXPLAIN

ANALYZE does not go into

functions and shows only overall time, so I have no real information

about what plan is actually used

in first().

You can get EXPLAIN plans if you use the auto_explain contrib module
with auto_explain.log_nested_statements enabled.

As you suspect, the two functions work differently.
The SQL function will plan a parameterized statement (with $1 etc. in
it)
and execute that statement whenever it is called, while the PL/pgSQL
function will execute an SQL statement with all the constant literals
in it that gets planned and executed when you call the function.

The SQL function will create a statement that cannot benefit from
optimizations that work only for certain constant values (although
there will be improvements in 9.2 for that). On the down side,
the PL/pgSQL function will have to plan the query every time it is
executed, which does not come for free.

To illustrate that, an example:

I create a table "test" as follows:

CREATE TABLE test(id integer PRIMARY KEY, val text NOT NULL);
CREATE INDEX test_val_ind ON test(val);

Then I fill it with 1000 rows, 11 of which have val='test'
and ANALYZE the table.

CREATE OR REPLACE FUNCTION first(text, integer) RETURNS integer
STABLE STRICT LANGUAGE sql AS
'SELECT id FROM test WHERE val=$1 LIMIT $2';

CREATE OR REPLACE FUNCTION second(IN v text, IN l integer, OUT r
integer)
STABLE STRICT LANGUAGE plpgsql AS
$$BEGIN
EXECUTE 'SELECT id FROM test WHERE val=' || quote_literal(v)
|| ' LIMIT ' || CAST(l AS integer) INTO r;
END$$;

Then I execute "SELECT first('test', 1)" and "SELECT SECOND('test', 1)".
The respective plans are:

For the SQL function:

Query Text: SELECT id FROM test WHERE val=$1 LIMIT $2
Limit (cost=0.00..1.75 rows=50 width=4)
-> Seq Scan on test (cost=0.00..17.50 rows=500 width=4)
Filter: (val = $1)

For the PL/pgSQL function:

Query Text: SELECT id FROM test WHERE val='test' LIMIT 1
Limit (cost=0.00..0.95 rows=1 width=4)
-> Index Scan using test_val_ind on test (cost=0.00..10.46 rows=11
width=4)
Index Cond: (val = 'test'::text)

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Laurenz Albe

laurenz.albe@cybertec.at

almost 14 years ago

In reply to: Dmitry Koterov (#3)

Re: Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE

Dmitry Koterov wrote:

But it's quite strange that SQL+STABLE function does not recalculate

the plan each time it is called.

Because when I use a bunch of SQL+STABLE functions in e.g. a

sub-select of a complex query, I see in

the plan of this complex queries that function calls are "expanded".

It looks like PostgreSQL uses SQL

code defined in SQL+STABLE functions and merges (injects) it into the

main query instead of the

function call.

E.g.:

========================================================================
==

CREATE TABLE a(i INTEGER, t VARCHAR(5));
INSERT INTO a(i, t) SELECT s, '' FROM generate_series(1, 10000) s;
INSERT INTO a(i, t) SELECT s, 't' FROM generate_series(10001, 10002)

CREATE INDEX "a_i_idx" ON "public"."a" USING btree ("i");
CREATE INDEX "a_t_idx" ON "public"."a" USING btree ("t");
ANALYZE a;

CREATE OR REPLACE FUNCTION a_get_t(in_a a) RETURNS TEXT STABLE

LANGUAGE sql

AS 'SELECT $1.t';

explain analyze
select * from a
where a_get_t(a) = 't';

QUERY PLAN
Index Scan using a_t_idx on a (cost=0.00..8.29 rows=2 width=5)

(actual time=0.041..0.043 rows=2

loops=1)
Index Cond: ((t)::text = 't'::text)

========================================================================
==

You may see that a_get_t() SQL code was merged into the main query

plan, so the result is found

without a seqscan with t='t' filtering, but the index is used.

That was a very simple example, I use much more complex SQL+STABLE

functions in my code and I am

practically sure that this SQL extraction+injection is applied by

PostgreSQL in other cases too (e.g.

sometimes planner initiates a hash join with tables which are referred

inside SQL+STABLE functions in

sub-queries). If I replace STABLE with VOLATILE in that complex cases,

the effect disappears: no more

SQL extraction performed.

And more: assume we have a function f(x, y, z) VOLATILE with very

complex SQL inside and we call it

like:

select * from f(1, 2, 3);

The query may took a long time (during not only the first call, but

during all others too). Then I

just replace VOLATILE to STABLE for f(), and the same query:

select * from f(1, 2, 3);

suddenly becomes very fast (e.g. 1000 times faster or even more). It's

a very common case: I've

performed many times. I thought that it was because of re-planning of

STABLE functions on each call

according to real passed values...

If STABLE functions has frozen plans too (independent to its real

passed arguments values), how could

we explain so much difference in performance replacing VOLATILE to

STABLE?

Simple SQL functions can get "inlined" in queries by the planner,
that's what you see in your first example.

This is IMO unrelated to the problem you describe where a complicated
STABLE SQL function performs much better than the same function declared
VOLATILE.

To understand that difference, one would have to look at the function
definition and EXPLAIN output for both cases, possibly using
auto_analyze.

Yours,
Laurenz Albe