a plpgsql bug
hello!
I found a problem in plpgsql. When there is a large loop in plpgsql, it is found that the change of search_path will cause memory exhaustion and thus disconnect the connection.
The test examples are as follows:
1 create a schema
create schema test_schema;
2 create a function create or replace function test_schema.test_f(id integer) returns integer as $$ declare var2 integer := 1; begin if id % 4 = 1 then return var2 + 2; elseif id % 4 = 2 then return var2 + 3; elseif id % 4 = 3 then return var2 + 4; else return var2; end if; end; $$ language plpgsql;
3 a loop in plpgsql,which wil result disconnection. do $$ declare var1 integer; begin for id in 1 .. 10000000 LOOP set search_path to test_schema; var1 = test_schema.test_f(id); set search_path to public; var1 = test_schema.test_f(id); end loop; end; $$ language plpgsql;
daidewei@highgo.com
Attachments:
On Mon, Sep 18, 2023 at 11:46 PM daidewei@highgo.com <daidewei@highgo.com>
wrote:
I found a problem in plpgsql. When there is a large loop in plpgsql,
it is found that the change of search_path will cause memory exhaustion and
thus disconnect the connection.
It is impossible to prevent queries from exhausting memory so the fact that
this one does isn't an indication of a bug on its own. I'm having trouble
imagining a reasonable use case for this pattern of changing search_path
frequently in a loop within a single execution of a function. That said,
I'm not in a position to judge how easy or difficult an improvement in this
area may be.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Mon, Sep 18, 2023 at 11:46 PM daidewei@highgo.com <daidewei@highgo.com>
wrote:I found a problem in plpgsql. When there is a large loop in plpgsql,
it is found that the change of search_path will cause memory exhaustion and
thus disconnect the connection.
It is impossible to prevent queries from exhausting memory so the fact that
this one does isn't an indication of a bug on its own. I'm having trouble
imagining a reasonable use case for this pattern of changing search_path
frequently in a loop within a single execution of a function. That said,
I'm not in a position to judge how easy or difficult an improvement in this
area may be.
I poked into this a bit with valgrind. It seems that the problem
is that the changes to search_path thrash the "simple expression"
mechanism in plpgsql, such that it has to re-plan the various
expressions in the called function each time through. It's good about
tracking the actual cached plans and not leaking those, but what is
getting leaked into transaction-lifespan memory is the data structures
made by
expr->expr_simple_state =
ExecInitExprWithParams(expr->expr_simple_expr,
econtext->ecxt_param_list_info);
We could conceivably reclaim that data if we were willing to set up
yet another per-expression memory context to hold it. That seems
like rather a high overhead though.
The given test case is obviously a bit artificial, but I think it
may be a simplification of fairly plausible use-cases. The triggering
condition is that the same textual expression in a plpgsql function
gets executed repeatedly with different search_path settings, which
doesn't seem that unreasonable.
Perhaps another approach could be to assume that only a small number
of distinct search_path settings will be used in any one transaction,
and cache a separate plan and estate for each one. That would have
the nice side-effect of avoiding the replanning overhead, but then
we'd have to figure out how to manage the cache and keep it from
blowing out.
regards, tom lane
I find another example which doesn't change search_path,but use function 'SECURITY DEFINER'
1 create a user
./psql -dpostgres
create user test_user;
grant all on database postgres to test_user;
2 use the new user to logon and create function
./psql -dpostgres -U test_user
create schema test_user;
create or replace function test_user.test_m(id integer) returns integer as
$$
declare
var2 integer := 1;
begin
if id % 4 = 1 then
return var2 + 2;
elseif id % 4 = 2 then
return var2 + 3;
elseif id % 4 = 3 then
return var2 + 4;
else
return var2;
end if;
end; $$ language plpgsql;
create or replace function test_user.test_f(id integer) returns integer SECURITY DEFINER as
$$
declare
var1 integer := 1;
begin
var1 := test_user.test_m(23);
return var1;
end; $$ language plpgsql;
3 execute
./psql -dpostgres
do $$
declare
var1 integer;
begin
for id in 1 .. 10000000 LOOP
var1 = test_user.test_m(id);
var1 = test_user.test_f(id);
end loop;
end; $$ language plpgsql;
daidewei@highgo.com
From: Tom Lane
Date: 2023-09-20 00:12
To: David G. Johnston
CC: daidewei@highgo.com; pgsql-bugs@lists.postgresql.org
Subject: Re: a plpgsql bug
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Mon, Sep 18, 2023 at 11:46 PM daidewei@highgo.com <daidewei@highgo.com>
wrote:I found a problem in plpgsql. When there is a large loop in plpgsql,
it is found that the change of search_path will cause memory exhaustion and
thus disconnect the connection.
It is impossible to prevent queries from exhausting memory so the fact that
this one does isn't an indication of a bug on its own. I'm having trouble
imagining a reasonable use case for this pattern of changing search_path
frequently in a loop within a single execution of a function. That said,
I'm not in a position to judge how easy or difficult an improvement in this
area may be.
I poked into this a bit with valgrind. It seems that the problem
is that the changes to search_path thrash the "simple expression"
mechanism in plpgsql, such that it has to re-plan the various
expressions in the called function each time through. It's good about
tracking the actual cached plans and not leaking those, but what is
getting leaked into transaction-lifespan memory is the data structures
made by
expr->expr_simple_state =
ExecInitExprWithParams(expr->expr_simple_expr,
econtext->ecxt_param_list_info);
We could conceivably reclaim that data if we were willing to set up
yet another per-expression memory context to hold it. That seems
like rather a high overhead though.
The given test case is obviously a bit artificial, but I think it
may be a simplification of fairly plausible use-cases. The triggering
condition is that the same textual expression in a plpgsql function
gets executed repeatedly with different search_path settings, which
doesn't seem that unreasonable.
Perhaps another approach could be to assume that only a small number
of distinct search_path settings will be used in any one transaction,
and cache a separate plan and estate for each one. That would have
the nice side-effect of avoiding the replanning overhead, but then
we'd have to figure out how to manage the cache and keep it from
blowing out.
regards, tom lane
Attachments:
maybe when function compiling,swith to its owner
daidewei@highgo.com
From: daidewei@highgo.com
Date: 2023-09-20 14:34
To: Tom Lane; David G. Johnston
CC: pgsql-bugs@lists.postgresql.org
Subject: Re: Re: a plpgsql bug
I find another example which doesn't change search_path,but use function 'SECURITY DEFINER'
1 create a user
./psql -dpostgres
create user test_user;
grant all on database postgres to test_user;
2 use the new user to logon and create function
./psql -dpostgres -U test_user
create schema test_user;
create or replace function test_user.test_m(id integer) returns integer as
$$
declare
var2 integer := 1;
begin
if id % 4 = 1 then
return var2 + 2;
elseif id % 4 = 2 then
return var2 + 3;
elseif id % 4 = 3 then
return var2 + 4;
else
return var2;
end if;
end; $$ language plpgsql;
create or replace function test_user.test_f(id integer) returns integer SECURITY DEFINER as
$$
declare
var1 integer := 1;
begin
var1 := test_user.test_m(23);
return var1;
end; $$ language plpgsql;
3 execute
./psql -dpostgres
do $$
declare
var1 integer;
begin
for id in 1 .. 10000000 LOOP
var1 = test_user.test_m(id);
var1 = test_user.test_f(id);
end loop;
end; $$ language plpgsql;
daidewei@highgo.com
From: Tom Lane
Date: 2023-09-20 00:12
To: David G. Johnston
CC: daidewei@highgo.com; pgsql-bugs@lists.postgresql.org
Subject: Re: a plpgsql bug
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Mon, Sep 18, 2023 at 11:46 PM daidewei@highgo.com <daidewei@highgo.com>
wrote:I found a problem in plpgsql. When there is a large loop in plpgsql,
it is found that the change of search_path will cause memory exhaustion and
thus disconnect the connection.
It is impossible to prevent queries from exhausting memory so the fact that
this one does isn't an indication of a bug on its own. I'm having trouble
imagining a reasonable use case for this pattern of changing search_path
frequently in a loop within a single execution of a function. That said,
I'm not in a position to judge how easy or difficult an improvement in this
area may be.
I poked into this a bit with valgrind. It seems that the problem
is that the changes to search_path thrash the "simple expression"
mechanism in plpgsql, such that it has to re-plan the various
expressions in the called function each time through. It's good about
tracking the actual cached plans and not leaking those, but what is
getting leaked into transaction-lifespan memory is the data structures
made by
expr->expr_simple_state =
ExecInitExprWithParams(expr->expr_simple_expr,
econtext->ecxt_param_list_info);
We could conceivably reclaim that data if we were willing to set up
yet another per-expression memory context to hold it. That seems
like rather a high overhead though.
The given test case is obviously a bit artificial, but I think it
may be a simplification of fairly plausible use-cases. The triggering
condition is that the same textual expression in a plpgsql function
gets executed repeatedly with different search_path settings, which
doesn't seem that unreasonable.
Perhaps another approach could be to assume that only a small number
of distinct search_path settings will be used in any one transaction,
and cache a separate plan and estate for each one. That would have
the nice side-effect of avoiding the replanning overhead, but then
we'd have to figure out how to manage the cache and keep it from
blowing out.
regards, tom lane