calling procedures is slow and consumes extra much memory against calling function
Hi
I try to use procedures in Orafce package, and I did some easy performance
tests. I found some hard problems:
1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;
This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;
about 2.2GB RAM and 10 sec.
When I rewrite same to functions then
create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;
do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;
Then execution is about 1 sec, and memory requirements are +/- zero.
Minimally it looks so CALL statements has a memory issue.
Regards
Pavel
Hi
ne 10. 5. 2020 v 22:20 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:
Hi
I try to use procedures in Orafce package, and I did some easy performance
tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
When I rewrite same to functions then
create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;Then execution is about 1 sec, and memory requirements are +/- zero.
Minimally it looks so CALL statements has a memory issue.
The problem is in plpgsql implementation of CALL statement
In non atomic case - case of using procedures from DO block, the
expression plan is not cached, and plan is generating any time. This is
reason why it is slow.
Unfortunately, generated plans are not released until SPI_finish. Attached
patch fixed this issue.
Regards
Pavel
Show quoted text
Regards
Pavel
Attachments:
plpgsq-call-fix.patchtext/x-patch; charset=US-ASCII; name=plpgsq-call-fix.patchDownload+6-0
On Sun, May 10, 2020 at 10:20:53PM +0200, Pavel Stehule wrote:
When I rewrite same to functions then
create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;Then execution is about 1 sec, and memory requirements are +/- zero.
Minimally it looks so CALL statements has a memory issue.
Behavior not limited to plpgsql. A plain SQL function shows the same
leak patterns:
create or replace procedure p1_sql(in r int, in v int)
as $$ SELECT r + v; $$ language sql;
And I cannot get valgrind to complain about lost references, so this
looks like some missing memory context handling.
Also, I actually don't quite get why the context created by
CreateExprContext() cannot be freed before the procedure returns. A
short test shows no problems in calling FreeExprContext() at the end
of ExecuteCallStmt(), but that does not address everything. Perhaps a
lack of tests with pass-by-reference expressions and procedures?
Peter?
--
Michael
po 11. 5. 2020 v 7:25 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:
Hi
ne 10. 5. 2020 v 22:20 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
When I rewrite same to functions then
create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;Then execution is about 1 sec, and memory requirements are +/- zero.
Minimally it looks so CALL statements has a memory issue.
The problem is in plpgsql implementation of CALL statement
In non atomic case - case of using procedures from DO block, the
expression plan is not cached, and plan is generating any time. This is
reason why it is slow.Unfortunately, generated plans are not released until SPI_finish. Attached
patch fixed this issue.
But now, recursive calling doesn't work :-(. So this patch is not enough
Show quoted text
Regards
Pavel
Regards
Pavel
Hi
The problem is in plpgsql implementation of CALL statement
In non atomic case - case of using procedures from DO block, the
expression plan is not cached, and plan is generating any time. This is
reason why it is slow.Unfortunately, generated plans are not released until SPI_finish.
Attached patch fixed this issue.But now, recursive calling doesn't work :-(. So this patch is not enough
Attached patch is working - all tests passed
It doesn't solve performance, and doesn't solve all memory problems, but
significantly reduce memory requirements from 5007 bytes to 439 bytes per
one CALL
Regards
Pavel
Show quoted text
Regards
Pavel
Regards
Pavel
Attachments:
plpgsql-call-fix.patchtext/x-patch; charset=US-ASCII; name=plpgsql-call-fix.patchDownload+15-0
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:
Hi
I try to use procedures in Orafce package, and I did some easy performance
tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
testing hardware?
regards,
Ranier Vilela
so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
testing hardware?
My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.
regards
Pavel
Show quoted text
regards,
Ranier Vilela
Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:
so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
testing hardware?My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.
The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.
regards,
Ranier Vilela
so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.
send me a patch, please
Pavel
Show quoted text
regards,
Ranier Vilela
Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:
so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).
Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.
regards,
Ranier Vilela
Attachments:
pl_exec_test.patchapplication/octet-stream; name=pl_exec_test.patchDownload+52-32
so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:
Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.
Why you check SPI_processed?
+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"",
expr->query);
CALL cannot to return rows, so these checks has not sense
Show quoted text
regards,
Ranier Vilela
Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:
so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.The hardware is definitely making a difference, but if you have time
and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.Why you check SPI_processed?
+ if (SPI_processed == 1) + { + if (!stmt->target) + elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query); + } + else if (SPI_processed > 1) + elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);CALL cannot to return rows, so these checks has not sense
Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.
regards,
Ranier Vilela
so 16. 5. 2020 v 15:24 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:
Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference
in the testing hardware?My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.The hardware is definitely making a difference, but if you have time
and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.Why you check SPI_processed?
+ if (SPI_processed == 1) + { + if (!stmt->target) + elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query); + } + else if (SPI_processed > 1) + elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);CALL cannot to return rows, so these checks has not sense
Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.
It's little bit messy. Is not good to mix bugfix and refactoring things
together
Show quoted text
regards,
Ranier Vilela
Em sáb., 16 de mai. de 2020 às 11:07, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:
so 16. 5. 2020 v 15:24 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:Hi
I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:1. test case
create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;This command requires
do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference
in the testing hardware?My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.The hardware is definitely making a difference, but if you have time
and don't mind testing it,
I can send you a patch, not that the modifications are a big deal,
but maybe they'll help.With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.Why you check SPI_processed?
+ if (SPI_processed == 1) + { + if (!stmt->target) + elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query); + } + else if (SPI_processed > 1) + elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);CALL cannot to return rows, so these checks has not sense
Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.It's little bit messy. Is not good to mix bugfix and refactoring things
together
Ok, I can understand that.
regards,
Ranier Vilela
On Sat, 16 May 2020 at 00:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi
The problem is in plpgsql implementation of CALL statement
In non atomic case - case of using procedures from DO block, the expression plan is not cached, and plan is generating any time. This is reason why it is slow.
Unfortunately, generated plans are not released until SPI_finish. Attached patch fixed this issue.
But now, recursive calling doesn't work :-(. So this patch is not enough
Attached patch is working - all tests passed
Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.
It doesn't solve performance, and doesn't solve all memory problems, but significantly reduce memory requirements from 5007 bytes to 439 bytes per one CALL
So now this patch's intention is to reduce memory consumption, and it
doesn't target slowness improvement, right ?
--
Thanks,
-Amit Khandekar
Huawei Technologies
st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:
On Sat, 16 May 2020 at 00:07, Pavel Stehule <pavel.stehule@gmail.com>
wrote:Hi
The problem is in plpgsql implementation of CALL statement
In non atomic case - case of using procedures from DO block, the
expression plan is not cached, and plan is generating any time. This is
reason why it is slow.Unfortunately, generated plans are not released until SPI_finish.
Attached patch fixed this issue.
But now, recursive calling doesn't work :-(. So this patch is not enough
Attached patch is working - all tests passed
Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.
it hangs on plpgsql tests. So you can apply first version of patch
and "make check"
It doesn't solve performance, and doesn't solve all memory problems, but
significantly reduce memory requirements from 5007 bytes to 439 bytes per
one CALLSo now this patch's intention is to reduce memory consumption, and it
doesn't target slowness improvement, right ?
yes. There is a problem with planning every execution when the procedure
was called from not top context.
Show quoted text
--
Thanks,
-Amit Khandekar
Huawei Technologies
On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:
st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.it hangs on plpgsql tests. So you can apply first version of patch
and "make check"
I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.
Consider this recursive test :
create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;
do $$
declare r int default 1;
begin
call p1(r);
end;
$$;
In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}
Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.
-Amit
st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:
On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com>
wrote:st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:
Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.it hangs on plpgsql tests. So you can apply first version of patch
and "make check"
I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.Consider this recursive test :
create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;do $$
declare r int default 1;
begin
call p1(r);
end;
$$;In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.
This is a good consideration.
I am sending updated patch
Pavel
Show quoted text
-Amit
Attachments:
plpgsql-call-fix-2.patchtext/x-patch; charset=US-ASCII; name=plpgsql-call-fix-2.patchDownload+23-3
On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com> wrote:
st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:
st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.it hangs on plpgsql tests. So you can apply first version of patch
and "make check"
I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.Consider this recursive test :
create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;do $$
declare r int default 1;
begin
call p1(r);
end;
$$;In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.This is a good consideration.
I am sending updated patch
Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :
Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :
if (plan && !plan->saved)
{
if (plan_owner)
SPI_freeplan(plan);
/* If expr->plan is present, it must be the same plan that we
allocated */
Assert ( !expr->plan || plan == expr->plan) );
expr->plan = NULL;
}
Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.
čt 9. 7. 2020 v 8:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:
On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com>
wrote:st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:
On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <
amitdkhan.pg@gmail.com> napsal:
Could you show an example testcase that tests this recursive
scenario,
with which your earlier patch fails the test, and this v2 patch
passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.it hangs on plpgsql tests. So you can apply first version of patch
and "make check"
I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.Consider this recursive test :
create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;do $$
declare r int default 1;
begin
call p1(r);
end;
$$;In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.This is a good consideration.
I am sending updated patch
Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :if (plan && !plan->saved)
{
if (plan_owner)
SPI_freeplan(plan);/* If expr->plan is present, it must be the same plan that we
allocated */
Assert ( !expr->plan || plan == expr->plan) );expr->plan = NULL;
}Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.
attached patch with assert.
all regress tests passed. I think this short patch can be applied on older
releases as bugfix.
This weekend I'll try to check different strategy - try to save a plan and
release it at the end of the transaction.
Regards
Pavel