Difference between speed of 2 functions: SQL+STABLE and PLPGSQL+EXECUTE
Hello.
For example, I have 2 functions like these:
CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS ... AS
$body$
...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
$body$
LANGUAGE 'sql'
*STABLE*
and
CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS ... AS
$body$
DECLARE
res ...;
BEGIN
EXECUTE '...the same SELECT, ' ||
'but ' || quote_literal(a) || ' args are embedded, plus ' ||
'LIMIT ' || quote_literal($3)
INTO res;
RETURN res;
END;
$body$
LANGUAGE 'plpgsql'
*STABLE*
And then I call
EXPLAIN ANALYZE SELECT * FROM first(...);
EXPLAIN ANALYZE SELECT * FROM second(...);
Should these two queries be executed by the same time usage (i.e. does
PostgreSQL generate same plans for inner queries)?
I always thought that the answer is YES: if a function is STABLE and with
language=SQL, its SQL code is embedded into outer context after all
arguments are expanded into their values (so the plan is built after
argument expansion). But some days ago I detected a case when second()
works about 100 times faster than first(), and the cause is seems that the
planner does not see all of expanded arguments in first() (if I replace
arguments to constants in first(), especially in LIMIT clause, it begins to
work the same speed as second() does). Unfortunately EXPLAIN ANALYZE does
not go into functions and shows only overall time, so I have no real
information about what plan is actually used in first().
Could you please comment this case a bit?..
Dmitry Koterov wrote:
For example, I have 2 functions like these:
CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS
... AS
$body$
...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
$body$
LANGUAGE 'sql'
STABLEand
CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS
... AS
$body$
DECLARE
res ...;
BEGIN
EXECUTE '...the same SELECT, ' ||
'but ' || quote_literal(a) || ' args are embedded, plus ' ||
'LIMIT ' || quote_literal($3)
INTO res;
RETURN res;
END;
$body$
LANGUAGE 'plpgsql'
STABLEAnd then I call
EXPLAIN ANALYZE SELECT * FROM first(...);
EXPLAIN ANALYZE SELECT * FROM second(...);Should these two queries be executed by the same time usage (i.e. does
PostgreSQL generate same plans
for inner queries)?
I always thought that the answer is YES: if a function is STABLE and
with language=SQL, its SQL code
is embedded into outer context after all arguments are expanded into
their values (so the plan is
built after argument expansion). But some days ago I detected a case
when second() works about 100
times faster than first(), and the cause is seems that the planner
does not see all of expanded
arguments in first() (if I replace arguments to constants in first(),
especially in LIMIT clause, it
begins to work the same speed as second() does). Unfortunately EXPLAIN
ANALYZE does not go into
functions and shows only overall time, so I have no real information
about what plan is actually used
in first().
You can get EXPLAIN plans if you use the auto_explain contrib module
with auto_explain.log_nested_statements enabled.
As you suspect, the two functions work differently.
The SQL function will plan a parameterized statement (with $1 etc. in
it)
and execute that statement whenever it is called, while the PL/pgSQL
function will execute an SQL statement with all the constant literals
in it that gets planned and executed when you call the function.
The SQL function will create a statement that cannot benefit from
optimizations that work only for certain constant values (although
there will be improvements in 9.2 for that). On the down side,
the PL/pgSQL function will have to plan the query every time it is
executed, which does not come for free.
To illustrate that, an example:
I create a table "test" as follows:
CREATE TABLE test(id integer PRIMARY KEY, val text NOT NULL);
CREATE INDEX test_val_ind ON test(val);
Then I fill it with 1000 rows, 11 of which have val='test'
and ANALYZE the table.
CREATE OR REPLACE FUNCTION first(text, integer) RETURNS integer
STABLE STRICT LANGUAGE sql AS
'SELECT id FROM test WHERE val=$1 LIMIT $2';
CREATE OR REPLACE FUNCTION second(IN v text, IN l integer, OUT r
integer)
STABLE STRICT LANGUAGE plpgsql AS
$$BEGIN
EXECUTE 'SELECT id FROM test WHERE val=' || quote_literal(v)
|| ' LIMIT ' || CAST(l AS integer) INTO r;
END$$;
Then I execute "SELECT first('test', 1)" and "SELECT SECOND('test', 1)".
The respective plans are:
For the SQL function:
Query Text: SELECT id FROM test WHERE val=$1 LIMIT $2
Limit (cost=0.00..1.75 rows=50 width=4)
-> Seq Scan on test (cost=0.00..17.50 rows=500 width=4)
Filter: (val = $1)
For the PL/pgSQL function:
Query Text: SELECT id FROM test WHERE val='test' LIMIT 1
Limit (cost=0.00..0.95 rows=1 width=4)
-> Index Scan using test_val_ind on test (cost=0.00..10.46 rows=11
width=4)
Index Cond: (val = 'test'::text)
Yours,
Laurenz Albe
Albe, thanks for detailed explaination.
But it's quite strange that SQL+STABLE function does not recalculate the
plan each time it is called. Because when I use a bunch of SQL+STABLE
functions in e.g. a sub-select of a complex query, I see in the plan of
this complex queries that function calls are "expanded". It looks like
PostgreSQL uses SQL code defined in SQL+STABLE functions and merges
(injects) it into the main query instead of the function call.
E.g.:
==========================================================================
CREATE TABLE a(i INTEGER, t VARCHAR(5));
INSERT INTO a(i, t) SELECT s, '' FROM generate_series(1, 10000) s;
INSERT INTO a(i, t) SELECT s, 't' FROM generate_series(10001, 10002) s;
CREATE INDEX "a_i_idx" ON "public"."a" USING btree ("i");
CREATE INDEX "a_t_idx" ON "public"."a" USING btree ("t");
ANALYZE a;
CREATE OR REPLACE FUNCTION a_get_t(in_a a) RETURNS TEXT STABLE LANGUAGE sql
AS 'SELECT $1.t';
explain analyze
select * from a
where a_get_t(a) = 't';
QUERY PLAN
Index Scan using a_t_idx on a (cost=0.00..8.29 rows=2 width=5) (actual
time=0.041..0.043 rows=2 loops=1)
Index Cond: ((t)::text = 't'::text)
==========================================================================
You may see that a_get_t() SQL code was merged into the main query plan, so
the result is found without a seqscan with t='t' filtering, but the index
is used.
That was a very simple example, I use much more complex SQL+STABLE
functions in my code and I am practically sure that this SQL
extraction+injection is applied by PostgreSQL in other cases too (e.g.
sometimes planner initiates a hash join with tables which are referred
inside SQL+STABLE functions in sub-queries). If I replace STABLE with
VOLATILE in that complex cases, the effect disappears: no more SQL
extraction performed.
And more: assume we have a function f(x, y, z) VOLATILE with very complex
SQL inside and we call it like:
select * from f(1, 2, 3);
The query may took a long time (during not only the first call, but during
all others too). Then I just replace VOLATILE to STABLE for f(), and the
same query:
select * from f(1, 2, 3);
suddenly becomes very fast (e.g. 1000 times faster or even more). It's a
very common case: I've performed many times. I thought that it was because
of re-planning of STABLE functions on each call according to real passed
values...
If STABLE functions has frozen plans too (independent to its real passed
arguments values), how could we explain so much difference in performance
replacing VOLATILE to STABLE?
On Fri, Apr 27, 2012 at 2:34 PM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
Show quoted text
Dmitry Koterov wrote:
For example, I have 2 functions like these:
CREATE OR REPLACE FUNCTION first(a INTEGER, b INTEGER, ...) RETURNS
... AS
$body$
...any SQL which uses $1, $2 etc. arguments, plus LIMIT $3...
$body$
LANGUAGE 'sql'
STABLEand
CREATE OR REPLACE FUNCTION second(a INTEGER, b INTEGER, ...) RETURNS
... AS
$body$
DECLARE
res ...;
BEGIN
EXECUTE '...the same SELECT, ' ||
'but ' || quote_literal(a) || ' args are embedded, plus ' ||
'LIMIT ' || quote_literal($3)
INTO res;
RETURN res;
END;
$body$
LANGUAGE 'plpgsql'
STABLEAnd then I call
EXPLAIN ANALYZE SELECT * FROM first(...);
EXPLAIN ANALYZE SELECT * FROM second(...);Should these two queries be executed by the same time usage (i.e. does
PostgreSQL generate same plans
for inner queries)?
I always thought that the answer is YES: if a function is STABLE and
with language=SQL, its SQL code
is embedded into outer context after all arguments are expanded into
their values (so the plan is
built after argument expansion). But some days ago I detected a case
when second() works about 100
times faster than first(), and the cause is seems that the planner
does not see all of expanded
arguments in first() (if I replace arguments to constants in first(),
especially in LIMIT clause, it
begins to work the same speed as second() does). Unfortunately EXPLAIN
ANALYZE does not go into
functions and shows only overall time, so I have no real information
about what plan is actually used
in first().
You can get EXPLAIN plans if you use the auto_explain contrib module
with auto_explain.log_nested_statements enabled.As you suspect, the two functions work differently.
The SQL function will plan a parameterized statement (with $1 etc. in
it)
and execute that statement whenever it is called, while the PL/pgSQL
function will execute an SQL statement with all the constant literals
in it that gets planned and executed when you call the function.The SQL function will create a statement that cannot benefit from
optimizations that work only for certain constant values (although
there will be improvements in 9.2 for that). On the down side,
the PL/pgSQL function will have to plan the query every time it is
executed, which does not come for free.To illustrate that, an example:
I create a table "test" as follows:
CREATE TABLE test(id integer PRIMARY KEY, val text NOT NULL);
CREATE INDEX test_val_ind ON test(val);Then I fill it with 1000 rows, 11 of which have val='test'
and ANALYZE the table.CREATE OR REPLACE FUNCTION first(text, integer) RETURNS integer
STABLE STRICT LANGUAGE sql AS
'SELECT id FROM test WHERE val=$1 LIMIT $2';CREATE OR REPLACE FUNCTION second(IN v text, IN l integer, OUT r
integer)
STABLE STRICT LANGUAGE plpgsql AS
$$BEGIN
EXECUTE 'SELECT id FROM test WHERE val=' || quote_literal(v)
|| ' LIMIT ' || CAST(l AS integer) INTO r;
END$$;Then I execute "SELECT first('test', 1)" and "SELECT SECOND('test', 1)".
The respective plans are:For the SQL function:
Query Text: SELECT id FROM test WHERE val=$1 LIMIT $2
Limit (cost=0.00..1.75 rows=50 width=4)
-> Seq Scan on test (cost=0.00..17.50 rows=500 width=4)
Filter: (val = $1)For the PL/pgSQL function:
Query Text: SELECT id FROM test WHERE val='test' LIMIT 1
Limit (cost=0.00..0.95 rows=1 width=4)
-> Index Scan using test_val_ind on test (cost=0.00..10.46 rows=11
width=4)
Index Cond: (val = 'test'::text)Yours,
Laurenz Albe--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Dmitry Koterov wrote:
But it's quite strange that SQL+STABLE function does not recalculate
the plan each time it is called.
Because when I use a bunch of SQL+STABLE functions in e.g. a
sub-select of a complex query, I see in
the plan of this complex queries that function calls are "expanded".
It looks like PostgreSQL uses SQL
code defined in SQL+STABLE functions and merges (injects) it into the
main query instead of the
function call.
E.g.:
========================================================================
==
CREATE TABLE a(i INTEGER, t VARCHAR(5));
INSERT INTO a(i, t) SELECT s, '' FROM generate_series(1, 10000) s;
INSERT INTO a(i, t) SELECT s, 't' FROM generate_series(10001, 10002)
s;
CREATE INDEX "a_i_idx" ON "public"."a" USING btree ("i");
CREATE INDEX "a_t_idx" ON "public"."a" USING btree ("t");
ANALYZE a;CREATE OR REPLACE FUNCTION a_get_t(in_a a) RETURNS TEXT STABLE
LANGUAGE sql
AS 'SELECT $1.t';
explain analyze
select * from a
where a_get_t(a) = 't';QUERY PLAN
Index Scan using a_t_idx on a (cost=0.00..8.29 rows=2 width=5)
(actual time=0.041..0.043 rows=2
loops=1)
Index Cond: ((t)::text = 't'::text)
========================================================================
==
You may see that a_get_t() SQL code was merged into the main query
plan, so the result is found
without a seqscan with t='t' filtering, but the index is used.
That was a very simple example, I use much more complex SQL+STABLE
functions in my code and I am
practically sure that this SQL extraction+injection is applied by
PostgreSQL in other cases too (e.g.
sometimes planner initiates a hash join with tables which are referred
inside SQL+STABLE functions in
sub-queries). If I replace STABLE with VOLATILE in that complex cases,
the effect disappears: no more
SQL extraction performed.
And more: assume we have a function f(x, y, z) VOLATILE with very
complex SQL inside and we call it
like:
select * from f(1, 2, 3);
The query may took a long time (during not only the first call, but
during all others too). Then I
just replace VOLATILE to STABLE for f(), and the same query:
select * from f(1, 2, 3);
suddenly becomes very fast (e.g. 1000 times faster or even more). It's
a very common case: I've
performed many times. I thought that it was because of re-planning of
STABLE functions on each call
according to real passed values...
If STABLE functions has frozen plans too (independent to its real
passed arguments values), how could
we explain so much difference in performance replacing VOLATILE to
STABLE?
Simple SQL functions can get "inlined" in queries by the planner,
that's what you see in your first example.
This is IMO unrelated to the problem you describe where a complicated
STABLE SQL function performs much better than the same function declared
VOLATILE.
To understand that difference, one would have to look at the function
definition and EXPLAIN output for both cases, possibly using
auto_analyze.
Yours,
Laurenz Albe