Couple of question on functions
Hello,
Reading the manual recently I came across this: (
http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )
Because of the snapshotting behavior of MVCC (see Chapter 13<http://www.postgresql.org/docs/8.3/interactive/mvcc.html>)
a function containing only SELECT commands can safely be marked
STABLE, even if it selects from tables that might be undergoing
modifications by concurrent queries. PostgreSQL will execute a STABLE
function using the snapshot established for the calling query, and so it
will see a fixed view of the database throughout that query. Also
note that the current_timestamp family of functions qualify as stable,
since their values do not change within a transaction.
It stroke me that it might be not all that safe to mark SELECTing only
function STABLE vs VOLATILE (or vice versa). Consider an example:
create table t1(id int);
create or replace function f1() returns void as
$$
declare
i int;
begin
select count(*) into i from t1;
raise notice '%', i;
-- waste some time
for i in 1..700000000 loop
end loop;
select count(*) into i from t1;
raise notice '%', i;
end;
$$
language 'plpgsql';
Now in first connection do:
select f1();
While the execution is in the loop which takes a while do in another
connection:
insert into t1 values (1);
The function returns with the following notices:
NOTICE: 0
NOTICE: 1
Should I change the volatility type of f1() to STABLE and run the above
again I would get:
NOTICE: 1
NOTICE: 1
It looks like at least plpgsql functions use most recent snapshot on each
call to SPI manager instead that of a calling query, so since default
transaction isolation level in postgres is READ COMMITTED concurrent
transactions may affect result of pure-reader VOLATILE function. I wonder if
any-language (including SQL,and C) function would behave in the same way?
Another thing I've recently discover is that SQL function seem to be
unexpectedly slow to call. Example:
create or replace function f2sql(int) returns int as
$$
select case when $1 < 100000 then 1
when 100000 <= $1 and $1 < 500000 then 2
when $1 >= 500000 then 3
end;
$$
language 'sql' immutable;
create or replace function f2plpgsql(int) returns int as
$$
begin
return case when $1 < 100000 then 1
when 100000 <= $1 and $1 < 500000 then 2
when $1 >= 500000 then 3
end;
end;
$$
language 'plpgsql' immutable;
These two function do exactly the same calculation on input and differ only
in language used. Now I write some query involving them and wrap it into
another function (so that I could use PERFORM to avoid possible overhead on
fetching results to the client, to cache the plan and to measure the time
in more precise manner):
create or replace function f3() returns void as
$$
declare
st timestamp;
begin
st := clock_timestamp();
perform f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int) +
f2sql(trunc(1000000*random())::int)
from generate_series(1, 100000);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;
create or replace function f4() returns void as
$$
declare
st timestamp;
begin
st := clock_timestamp();
perform f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int) +
f2plpgsql(trunc(1000000*random())::int)
from generate_series(1, 100000);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;
Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that
is a notable difference especially considering that SQL function is likely
to be inlined. Do i miss something?
"Vyacheslav Kalinin" <vka@mgcp.com> writes:
Reading the manual recently I came across this: (
http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )Because of the snapshotting behavior of MVCC (see Chapter 13<http://www.postgresql.org/docs/8.3/interactive/mvcc.html>)
a function containing only SELECT commands can safely be marked
STABLE, even if it selects from tables that might be undergoing
modifications by concurrent queries. PostgreSQL will execute a STABLE
function using the snapshot established for the calling query, and so it
will see a fixed view of the database throughout that query.
It stroke me that it might be not all that safe to mark SELECTing only
function STABLE vs VOLATILE (or vice versa).
What it says is that you *can* mark such a function stable, without
violating the rules for a stable function. It doesn't say that this
choice doesn't affect the results. Feel free to propose better wording...
Another thing I've recently discover is that SQL function seem to be
unexpectedly slow to call. Example:
...
perform f2sql(trunc(1000000*random())::int) +
...
Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that
is a notable difference especially considering that SQL function is likely
to be inlined. Do i miss something?
You might think it's "likely", but in fact no inlining will happen
because you have a volatile argument that's used more than once inside
the function body. Inlining would change the results.
regards, tom lane
You might think it's "likely", but in fact no inlining will happen
because you have a volatile argument that's used more than once inside
the function body. Inlining would change the results.
Yeah, right, apparently I overestimated chances of inilining, thanks for the
hint, Tom. In fact in my project performance problem was caused by the fact
that the analogue of f2sql() was declared SECURITY INVOKER and thus could
not be inlined :-/. Uhm, does anyone else think inlining tips are worth
mentioning in docs or it's just me?
On Tue, 13 May 2008 22:51:00 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Vyacheslav Kalinin" <vka@mgcp.com> writes:
Reading the manual recently I came across this: (
http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )Because of the snapshotting behavior of MVCC (see Chapter
13<http://www.postgresql.org/docs/8.3/interactive/mvcc.html>)a function containing only SELECT commands can safely be marked
STABLE, even if it selects from tables that might be undergoing
modifications by concurrent queries. PostgreSQL will execute a
STABLEfunction using the snapshot established for the calling query,
and so itwill see a fixed view of the database throughout that query.
It stroke me that it might be not all that safe to mark SELECTing
only function STABLE vs VOLATILE (or vice versa).What it says is that you *can* mark such a function stable, without
violating the rules for a stable function. It doesn't say that this
choice doesn't affect the results. Feel free to propose better
wording...
I'm confused...
Actually f1 seems to be stable. the insert is executed "outside" the
function.
My understanding is that immutable, stable and volatile are hints for
the optimizer.
Results from an immutable function could be cached across the whole
life of the DB if input parameters are the same.
insert into t (a,b) values(5,fi(3));
insert (a,b) values(7,fi(3));
fi *could* be executed just one time.
Results from a stable function could be cached across a statement.
insert into t (a,b) values(fs(3),fs(3));
fs *could* be executed just one time.
Inside *any* function selects will see the snapshot and the
modification made inside the function since function are executed
inside an implicit transaction.
Now I read:
http://searchwarp.com/swa9860.htm
Read Committed Isolation Level
Read Committed is the default isolation level in PostgreSQL. When a
transaction runs on this isolation level, a SELECT query sees only
data committed before the query began it never sees either
uncommitted data or changes committed during query execution by
concurrent.
create table t1(a int);
insert into t1 values(1);
create or replace function ft(out a1 int, out a2 int) as
$$
begin
select into a1 a from t1 limit 1;
for i in 1..700000000 loop
end loop;
select into a2 a from t1 limit 1;
return;
end;
$$ language plpgsql;
select * from ft();
update t1 set a=5;
So I'd expect ft() return always (1,1) or (5,5).
Since
select * from ft();
is one statement... it should see only data that were committed when
select started.
But actually I can obtain (1,5)
???
--
Ivan Sergio Borgonovo
http://www.webthatworks.it