calling procedures is slow and consumes extra much memory against calling function

Started by Pavel Stehuleover 5 years ago28 messages

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

I try to use procedures in Orafce package, and I did some easy performance
tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

When I rewrite same to functions then

create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;

Then execution is about 1 sec, and memory requirements are +/- zero.

Minimally it looks so CALL statements has a memory issue.

Regards

Pavel

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#1)

1 attachment(s)

Re: calling procedures is slow and consumes extra much memory against calling function

ne 10. 5. 2020 v 22:20 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

I try to use procedures in Orafce package, and I did some easy performance
tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

When I rewrite same to functions then

create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;

Then execution is about 1 sec, and memory requirements are +/- zero.

Minimally it looks so CALL statements has a memory issue.

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the
expression plan is not cached, and plan is generating any time. This is
reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish. Attached
patch fixed this issue.

Regards

Pavel

Show quoted text

Regards

Pavel

Attachments:

plpgsq-call-fix.patchtext/x-patch; charset=US-ASCII; name=plpgsq-call-fix.patchDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index aeb6c8fefc..bb08037a94 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2309,13 +2309,19 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 * could have been unset already, in case of a recursive call.
 		 */
 		if (expr->plan && !expr->plan->saved)
+		{
+			SPI_freeplan(expr->plan);
 			expr->plan = NULL;
+		}
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
 
 	if (expr->plan && !expr->plan->saved)
+	{
+		SPI_freeplan(expr->plan);
 		expr->plan = NULL;
+	}
 
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Pavel Stehule (#1)

Re: calling procedures is slow and consumes extra much memory against calling function

On Sun, May 10, 2020 at 10:20:53PM +0200, Pavel Stehule wrote:

When I rewrite same to functions then

create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

Then execution is about 1 sec, and memory requirements are +/- zero.

Minimally it looks so CALL statements has a memory issue.

Behavior not limited to plpgsql. A plain SQL function shows the same
leak patterns:
create or replace procedure p1_sql(in r int, in v int)
as $$ SELECT r + v; $$ language sql;
And I cannot get valgrind to complain about lost references, so this
looks like some missing memory context handling.

Also, I actually don't quite get why the context created by
CreateExprContext() cannot be freed before the procedure returns. A
short test shows no problems in calling FreeExprContext() at the end
of ExecuteCallStmt(), but that does not address everything. Perhaps a
lack of tests with pass-by-reference expressions and procedures?

Peter?
--
Michael

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#2)

Re: calling procedures is slow and consumes extra much memory against calling function

po 11. 5. 2020 v 7:25 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

ne 10. 5. 2020 v 22:20 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

When I rewrite same to functions then

create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;

Then execution is about 1 sec, and memory requirements are +/- zero.

Minimally it looks so CALL statements has a memory issue.

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the
expression plan is not cached, and plan is generating any time. This is
reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish. Attached
patch fixed this issue.

But now, recursive calling doesn't work :-(. So this patch is not enough

Show quoted text

Regards

Pavel

Regards

Pavel

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#4)

1 attachment(s)

Re: calling procedures is slow and consumes extra much memory against calling function

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the
expression plan is not cached, and plan is generating any time. This is
reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish.
Attached patch fixed this issue.

But now, recursive calling doesn't work :-(. So this patch is not enough

Attached patch is working - all tests passed

It doesn't solve performance, and doesn't solve all memory problems, but
significantly reduce memory requirements from 5007 bytes to 439 bytes per
one CALL

Regards

Pavel

Show quoted text

Regards

Pavel

Regards

Pavel

Attachments:

plpgsql-call-fix.patchtext/x-patch; charset=US-ASCII; name=plpgsql-call-fix.patchDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index aeb6c8fefc..755387aab2 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2143,6 +2143,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile int rc;
+	bool plan_owner = false;
 
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
@@ -2162,6 +2163,16 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 */
 			exec_prepare_plan(estate, expr, 0, estate->atomic);
 
+			/*
+			 * Not saved plan should be explicitly released. Without it, any
+			 * not recursive usage of CALL statemenst leak plan in SPI memory.
+			 * The created plan can be reused when procedure is called recursively,
+			 * and releasing plan can be done only in recursion root call, when
+			 * expression has not assigned plan. Where a plan was created, then
+			 * there plan can be released.
+			 */
+			plan_owner = true;
+
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
@@ -2315,7 +2326,11 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	PG_END_TRY();
 
 	if (expr->plan && !expr->plan->saved)
+	{
+		if (plan_owner)
+			SPI_freeplan(expr->plan);
 		expr->plan = NULL;
+	}
 
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#1)

Re: calling procedures is slow and consumes extra much memory against calling function

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy performance
tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
testing hardware?

regards,
Ranier Vilela

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Ranier Vilela (#6)

Re: calling procedures is slow and consumes extra much memory against calling function

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

regards

Pavel

Show quoted text

regards,
Ranier Vilela

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#7)

Re: calling procedures is slow and consumes extra much memory against calling function

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the
testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.

regards,
Ranier Vilela

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Ranier Vilela (#8)

Re: calling procedures is slow and consumes extra much memory against calling function

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.

send me a patch, please

Pavel

Show quoted text

regards,
Ranier Vilela

#10

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#9)

1 attachment(s)

Re: calling procedures is slow and consumes extra much memory against calling function

Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.

With more testing, I found that latency increases response time.

With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

regards,
Ranier Vilela

Attachments:

pl_exec_test.patchapplication/octet-stream; name=pl_exec_test.patchDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index aeb6c8fefc..ce34438e0c 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2142,13 +2142,15 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
-	volatile int rc;
+	volatile int rc = -1;
+	bool plan_owner = false;
 
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
 	{
-		SPIPlanPtr	plan = expr->plan;
+		SPIPlanPtr plan = expr->plan;
 		ParamListInfo paramLI;
+		Node	   *node;
 
 		if (plan == NULL)
 		{
@@ -2162,6 +2164,16 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 */
 			exec_prepare_plan(estate, expr, 0, estate->atomic);
 
+			/*
+			 * Not saved plan should be explicitly released. Without it, any
+			 * not recursive usage of CALL statemenst leak plan in SPI memory.
+			 * The created plan can be reused when procedure is called recursively,
+			 * and releasing plan can be done only in recursion root call, when
+			 * expression has not assigned plan. Where a plan was created, then
+			 * there plan can be released.
+			 */
+			plan_owner = true;
+
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
@@ -2184,18 +2196,17 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 */
 		if (stmt->is_call && stmt->target == NULL)
 		{
-			Node	   *node;
-			FuncExpr   *funcexpr;
-			HeapTuple	func_tuple;
-			List	   *funcargs;
-			Oid		   *argtypes;
-			char	  **argnames;
-			char	   *argmodes;
-			MemoryContext oldcontext;
-			PLpgSQL_row *row;
-			int			nfields;
-			int			i;
-			ListCell   *lc;
+			HeapTuple	      func_tuple;
+			MemoryContext     oldcontext;
+			const FuncExpr   *funcexpr;
+			const List	     *funcargs;
+			const Oid		 *argtypes;
+			const char	    **argnames;
+			const char	     *argmodes;
+			const ListCell   *lc;
+			PLpgSQL_row      *row;
+			int			      nfields;
+			int			      i;
 
 			/*
 			 * Get the parsed CallStmt, and look up the called procedure
@@ -2203,14 +2214,14 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			node = linitial_node(Query,
 								 ((CachedPlanSource *) linitial(plan->plancache_list))->query_list)->utilityStmt;
 			if (node == NULL || !IsA(node, CallStmt))
-				elog(ERROR, "query for CALL statement is not a CallStmt");
+				elog(ERROR, "CALL statement is not a CallStmt, query \"%s\"", expr->query);
 
 			funcexpr = ((CallStmt *) node)->funcexpr;
 
 			func_tuple = SearchSysCache1(PROCOID,
 										 ObjectIdGetDatum(funcexpr->funcid));
 			if (!HeapTupleIsValid(func_tuple))
-				elog(ERROR, "cache lookup failed for function %u",
+				elog(ERROR, "Cache lookup failed for function %u",
 					 funcexpr->funcid);
 
 			/*
@@ -2249,15 +2260,15 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			i = 0;
 			foreach(lc, funcargs)
 			{
-				Node	   *n = lfirst(lc);
+				node = lfirst(lc);
 
 				if (argmodes &&
 					(argmodes[i] == PROARGMODE_INOUT ||
 					 argmodes[i] == PROARGMODE_OUT))
 				{
-					if (IsA(n, Param))
+					if (IsA(node, Param))
 					{
-						Param	   *param = (Param *) n;
+						const Param	*param = (Param *) node;
 
 						/* paramid is offset by 1 (see make_datum_param()) */
 						row->varnos[nfields++] = param->paramid - 1;
@@ -2277,7 +2288,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 											i + 1)));
 					}
 				}
-				i++;
+				++i;
 			}
 
 			row->nfields = nfields;
@@ -2314,12 +2325,23 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	}
 	PG_END_TRY();
 
-	if (expr->plan && !expr->plan->saved)
-		expr->plan = NULL;
-
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",
 			 expr->query, SPI_result_code_string(rc));
+	if (SPI_processed == 1)
+	{
+		if (!stmt->target)
+			elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+	}
+	else if (SPI_processed > 1)
+		elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);
+
+	if (expr->plan && !expr->plan->saved)
+	{
+		if (plan_owner)
+			SPI_freeplan(expr->plan);
+		expr->plan = NULL;
+	}
 
 	after_lxid = MyProc->lxid;
 
@@ -2339,9 +2361,12 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 * If we are in a new transaction after the call, we need to build new
 		 * simple-expression infrastructure.
 		 */
-		estate->simple_eval_estate = NULL;
-		estate->simple_eval_resowner = NULL;
-		plpgsql_create_econtext(estate);
+	    if (SPI_processed == 1)
+		{
+	        estate->simple_eval_estate = NULL;
+	        estate->simple_eval_resowner = NULL;
+	        plpgsql_create_econtext(estate);
+	    }
 	}
 
 	/*
@@ -2350,15 +2375,10 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	 */
 	if (SPI_processed == 1)
 	{
-		SPITupleTable *tuptab = SPI_tuptable;
-
-		if (!stmt->target)
-			elog(ERROR, "DO statement returned a row");
+		const SPITupleTable *tuptab = SPI_tuptable;
 
 		exec_move_row(estate, stmt->target, tuptab->vals[0], tuptab->tupdesc);
 	}
-	else if (SPI_processed > 1)
-		elog(ERROR, "procedure call returned more than one row");
 
 	exec_eval_cleanup(estate);
 	SPI_freetuptable(SPI_tuptable);

#11

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Ranier Vilela (#10)

Re: calling procedures is slow and consumes extra much memory against calling function

so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

The hardware is definitely making a difference, but if you have time and
don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.

With more testing, I found that latency increases response time.

With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?

+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"",
expr->query);

CALL cannot to return rows, so these checks has not sense

Show quoted text

regards,
Ranier Vilela

#12

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#11)

Re: calling procedures is slow and consumes extra much memory against calling function

Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:

so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in
the testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

The hardware is definitely making a difference, but if you have time
and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.

With more testing, I found that latency increases response time.

With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?
+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"",
expr->query);
CALL cannot to return rows, so these checks has not sense

Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.

regards,
Ranier Vilela

#13

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Ranier Vilela (#12)

Re: calling procedures is slow and consumes extra much memory against calling function

so 16. 5. 2020 v 15:24 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:
so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference
in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

The hardware is definitely making a difference, but if you have time
and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but
maybe they'll help.

With more testing, I found that latency increases response time.

With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?
+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"",
expr->query);
CALL cannot to return rows, so these checks has not sense
Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.

It's little bit messy. Is not good to mix bugfix and refactoring things
together

Show quoted text

regards,
Ranier Vilela

#14

Ranier Vilela

ranier.vf@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#13)

Re: calling procedures is slow and consumes extra much memory against calling function

Em sáb., 16 de mai. de 2020 às 11:07, Pavel Stehule <pavel.stehule@gmail.com>
escreveu:

so 16. 5. 2020 v 15:24 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:
Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:
so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com>
napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <
pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy
performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version
(exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference
in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with
--enable-cassert option.

The hardware is definitely making a difference, but if you have time
and don't mind testing it,
I can send you a patch, not that the modifications are a big deal,
but maybe they'll help.

With more testing, I found that latency increases response time.

With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters
previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?
+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"",
expr->query);
CALL cannot to return rows, so these checks has not sense
Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.
It's little bit messy. Is not good to mix bugfix and refactoring things
together

Ok, I can understand that.

regards,
Ranier Vilela

#15

Amit Khandekar

amitdkhan.pg@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#5)

Re: calling procedures is slow and consumes extra much memory against calling function

On Sat, 16 May 2020 at 00:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the expression plan is not cached, and plan is generating any time. This is reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish. Attached patch fixed this issue.

But now, recursive calling doesn't work :-(. So this patch is not enough

Attached patch is working - all tests passed

Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

It doesn't solve performance, and doesn't solve all memory problems, but significantly reduce memory requirements from 5007 bytes to 439 bytes per one CALL

So now this patch's intention is to reduce memory consumption, and it
doesn't target slowness improvement, right ?

--
Thanks,
-Amit Khandekar
Huawei Technologies

#16

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Amit Khandekar (#15)

Re: calling procedures is slow and consumes extra much memory against calling function

st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Sat, 16 May 2020 at 00:07, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Hi

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the

expression plan is not cached, and plan is generating any time. This is
reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish.

Attached patch fixed this issue.

But now, recursive calling doesn't work :-(. So this patch is not enough

Attached patch is working - all tests passed

Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

it hangs on plpgsql tests. So you can apply first version of patch

and "make check"

It doesn't solve performance, and doesn't solve all memory problems, but

significantly reduce memory requirements from 5007 bytes to 439 bytes per
one CALL

So now this patch's intention is to reduce memory consumption, and it
doesn't target slowness improvement, right ?

yes. There is a problem with planning every execution when the procedure
was called from not top context.

Show quoted text

--
Thanks,
-Amit Khandekar
Huawei Technologies

#17

Amit Khandekar

amitdkhan.pg@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#16)

Re: calling procedures is slow and consumes extra much memory against calling function

On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:

st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:

Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

it hangs on plpgsql tests. So you can apply first version of patch

and "make check"

I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.

Consider this recursive test :

create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;

do $$
declare r int default 1;
begin
call p1(r);
end;
$$;

In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}

Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.

-Amit

#18

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Amit Khandekar (#17)

1 attachment(s)

Re: calling procedures is slow and consumes extra much memory against calling function

st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>

napsal:

Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

it hangs on plpgsql tests. So you can apply first version of patch

and "make check"

I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.

Consider this recursive test :

create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;

do $$
declare r int default 1;
begin
call p1(r);
end;
$$;

In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}

Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.

This is a good consideration.

I am sending updated patch

Pavel

Show quoted text

-Amit

Attachments:

plpgsql-call-fix-2.patchtext/x-patch; charset=US-ASCII; name=plpgsql-call-fix-2.patchDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index f41d675d65..88b8c7882a 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2132,15 +2132,16 @@ static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
 	PLpgSQL_expr *expr = stmt->expr;
+	volatile SPIPlanPtr	plan = expr->plan;
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile int rc;
+	volatile bool plan_owner = false;
 
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
 	{
-		SPIPlanPtr	plan = expr->plan;
 		ParamListInfo paramLI;
 
 		if (plan == NULL)
@@ -2155,6 +2156,16 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 */
 			exec_prepare_plan(estate, expr, 0, estate->atomic);
 
+			/*
+			 * Not saved plan should be explicitly released. Without it, any
+			 * not recursive usage of CALL statemenst leak plan in SPI memory.
+			 * The created plan can be reused when procedure is called recursively,
+			 * and releasing plan can be done only in recursion root call, when
+			 * expression has not assigned plan. Where a plan was created, then
+			 * there plan can be released.
+			 */
+			plan_owner = true;
+
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
@@ -2301,14 +2312,23 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 * If we aren't saving the plan, unset the pointer.  Note that it
 		 * could have been unset already, in case of a recursive call.
 		 */
-		if (expr->plan && !expr->plan->saved)
+		if (plan && !plan->saved)
+		{
+			if (plan_owner)
+				SPI_freeplan(plan);
+
 			expr->plan = NULL;
+		}
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
 
-	if (expr->plan && !expr->plan->saved)
+	if (plan && !plan->saved)
+	{
+		if (plan_owner)
+			SPI_freeplan(plan);
 		expr->plan = NULL;
+	}
 
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",

#19

Amit Khandekar

amitdkhan.pg@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#18)

Re: calling procedures is slow and consumes extra much memory against calling function

On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com> wrote:

st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:

On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:

st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:

Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

it hangs on plpgsql tests. So you can apply first version of patch

and "make check"

I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.

Consider this recursive test :

create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;

do $$
declare r int default 1;
begin
call p1(r);
end;
$$;

In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}

Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.

This is a good consideration.

I am sending updated patch

Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :

Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :

if (plan && !plan->saved)
{
if (plan_owner)
SPI_freeplan(plan);

/* If expr->plan is present, it must be the same plan that we
allocated */
Assert ( !expr->plan || plan == expr->plan) );

expr->plan = NULL;
}

Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.

#20

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Amit Khandekar (#19)

1 attachment(s)

Re: calling procedures is slow and consumes extra much memory against calling function

čt 9. 7. 2020 v 8:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>

napsal:

On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com>

wrote:

st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <

amitdkhan.pg@gmail.com> napsal:

Could you show an example testcase that tests this recursive

scenario,

with which your earlier patch fails the test, and this v2 patch

passes

it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

it hangs on plpgsql tests. So you can apply first version of patch

and "make check"

I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.

Consider this recursive test :

create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;

do $$
declare r int default 1;
begin
call p1(r);
end;
$$;

In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}

Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.

This is a good consideration.

I am sending updated patch

Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :

Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :

if (plan && !plan->saved)
{
if (plan_owner)
SPI_freeplan(plan);

/* If expr->plan is present, it must be the same plan that we
allocated */
Assert ( !expr->plan || plan == expr->plan) );

expr->plan = NULL;
}

Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.

attached patch with assert.

all regress tests passed. I think this short patch can be applied on older
releases as bugfix.

This weekend I'll try to check different strategy - try to save a plan and
release it at the end of the transaction.

Regards

Pavel

Attachments:

plpgsql-stmt_call_release_plan.patchtext/x-patch; charset=US-ASCII; name=plpgsql-stmt_call_release_plan.patchDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..22c0376795 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2150,15 +2150,16 @@ static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
 	PLpgSQL_expr *expr = stmt->expr;
+	volatile SPIPlanPtr	plan = expr->plan;
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile int rc;
+	volatile bool plan_owner = false;
 
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
 	{
-		SPIPlanPtr	plan = expr->plan;
 		ParamListInfo paramLI;
 
 		if (plan == NULL)
@@ -2173,6 +2174,16 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 */
 			exec_prepare_plan(estate, expr, 0, estate->atomic);
 
+			/*
+			 * Not saved plan should be explicitly released. Without it, any
+			 * not recursive usage of CALL statemenst leak plan in SPI memory.
+			 * The created plan can be reused when procedure is called recursively,
+			 * and releasing plan can be done only in recursion root call, when
+			 * expression has not assigned plan. Where a plan was created, then
+			 * there plan can be released.
+			 */
+			plan_owner = true;
+
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
@@ -2319,14 +2330,30 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 * If we aren't saving the plan, unset the pointer.  Note that it
 		 * could have been unset already, in case of a recursive call.
 		 */
-		if (expr->plan && !expr->plan->saved)
+		if (plan && !plan->saved)
+		{
+			if (plan_owner)
+				SPI_freeplan(plan);
+
 			expr->plan = NULL;
+		}
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
 
-	if (expr->plan && !expr->plan->saved)
+	if (plan && !plan->saved)
+	{
+		if (plan_owner)
+			SPI_freeplan(plan);
+
+		/*
+		 * If expr->plan is present, it must be the same plan that we
+		 * allocated
+		 */
+		Assert (!expr->plan || plan == expr->plan);
+
 		expr->plan = NULL;
+	}
 
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",

#21

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#20)

Re: calling procedures is slow and consumes extra much memory against calling function

so 11. 7. 2020 v 7:38 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

čt 9. 7. 2020 v 8:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>

napsal:

On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com>

wrote:

st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <

amitdkhan.pg@gmail.com> napsal:

Could you show an example testcase that tests this recursive

scenario,

with which your earlier patch fails the test, and this v2 patch

passes

it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

it hangs on plpgsql tests. So you can apply first version of patch

and "make check"

I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.

Consider this recursive test :

create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;

do $$
declare r int default 1;
begin
call p1(r);
end;
$$;

In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}

Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.

This is a good consideration.

I am sending updated patch

Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :

Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :

if (plan && !plan->saved)
{
if (plan_owner)
SPI_freeplan(plan);

/* If expr->plan is present, it must be the same plan that we
allocated */
Assert ( !expr->plan || plan == expr->plan) );

expr->plan = NULL;
}

Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.

attached patch with assert.

all regress tests passed. I think this short patch can be applied on older
releases as bugfix.

This weekend I'll try to check different strategy - try to save a plan and
release it at the end of the transaction.

I check it, and this state of patch is good enough for this moment. Another
fix needs more invasive changes to handling plan cache.

Regards

Pavel

Show quoted text

Regards

Pavel

#22

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#21)

2 attachment(s)

Re: calling procedures is slow and consumes extra much memory against calling function

I am sending another patch that tries to allow CachedPlans for CALL
statements. I think this patch is very accurate, but it is not nice,
because it is smudging very precious reference counting for CachedPlans.

Current issue:
==========

I found a problem with repeated CALL statements from DO command. For every
execution of a CALL statement a plan is created that is released at the
time of the end of DO block.

create or replace procedure insert_into_foo(i int)
as $$
begin
insert into foo values(i, i || 'Ahoj');
if i % 1000 = 0 then raise notice '%', i;
--commit;
end if;
end;
$$ language plpgsql;

and DO

do $$
begin
for i in 1..500000
loop
call insert_into_foo(i);
end loop;
end
$$;

Requires about 2.5GB RAM (execution time is 18 sec). The problem is "long
transaction" with 500M iteration of CALL statement.

If I try to remove a comment before COMMIT - then I get 500 transactions.
But still it needs 2.5GB memory.

The reason for this behaviour is disabling plan cache for CALL statements
executed in atomic mode.

So I wrote patch 1, that releases the not saved plan immediately. This
patch is very simple, and fixes memory issues. It is a little bit faster
(14 sec), and Postgres consumes about 200KB.

Patch 1 is simple, clean, nice but execution of CALL statements is slow due
repeated planning.

I tried to fix this issue another way - by little bit different work with
plan cache reference counting. Current design expects only statements
wrapped inside transactions. It is not designed for new possibilities in
CALL statements, when more transactions can be finished inside one
statement. Principally - cached plans should not be reused in different
transactions (simple expressions are an exception). So if we try to use
cached plans for CALL statements, there is no clean responsibility who has
to close a cached plan. It can be SPI (when execution stays in the same
transaction), or resource owner (when transaction is finished inside
execution of SPI).

The Peter wrote a comment about it

<--><--><-->/*
<--><--><--> * Don't save the plan if not in atomic context. Otherwise,
<--><--><--> * transaction ends would cause errors about plancache leaks.
<--><--><--> *

This comment is not fully accurate. If we try to save the plan, then
execution (with finished transaction inside) ends by segfault. Cached plan
is released on transaction end (by resource owner) and related memory
context is released. But next time this structure is accessed. There is
only a warning about unclosed plan cache (it maybe depends on other things).

I wrote a patch 2 that marks CALL statement related plans as "fragile". In
this case the plan is cached every time. There is a special mark "fragile",
that blocks immediately releasing related memory context, and it blocks
warnings and errors because for this case we expect closing plan cache by
resource owner or by SPI statement.

It reduces well CPU and memory overhead - execution time (in one big
transaction is only 8sec) - memory overhead is +/- 0

Patch2 is not too clear, and too readable although I think so it is more
correct. It better fixes SPI behaviour against new state - possibility to
commit, rollback inside procedures (inside SPI call).

All regress tests passed.

Regards

Pavel

Attachments:

plpgsql-stmt_call-fix-2.patchtext/x-patch; charset=US-ASCII; name=plpgsql-stmt_call-fix-2.patchDownload

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 055ebb77ae..d6bbbc3dbd 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -24,6 +24,7 @@
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
+#include "storage/proc.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
 #include "utils/builtins.h"
@@ -2223,6 +2224,9 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
 
+	LocalTransactionId before_lxid = MyProc->lxid;
+	bool		is_fragile = plan->fragile;
+
 	/*
 	 * Setup error traceback support for ereport()
 	 */
@@ -2326,6 +2330,8 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 		 * plan, the refcount must be backed by the CurrentResourceOwner.
 		 */
 		cplan = GetCachedPlan(plansource, paramLI, plan->saved, _SPI_current->queryEnv);
+		cplan->is_fragile = is_fragile;
+
 		stmt_list = cplan->stmt_list;
 
 		/*
@@ -2520,8 +2526,19 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 			}
 		}
 
-		/* Done with this plan, so release refcount */
-		ReleaseCachedPlan(cplan, plan->saved);
+		if (is_fragile)
+		{
+			LocalTransactionId after_lxid  = MyProc->lxid;;
+
+			if (before_lxid == after_lxid)
+				ReleaseCachedPlan(cplan, plan->saved);
+
+			FinalReleaseFragileCachedPlan(cplan);
+		}
+		else
+			/* Done with this plan, so release refcount */
+			ReleaseCachedPlan(cplan, plan->saved);
+
 		cplan = NULL;
 
 		/*
@@ -2539,9 +2556,25 @@ fail:
 	if (pushed_active_snap)
 		PopActiveSnapshot();
 
-	/* We no longer need the cached plan refcount, if any */
 	if (cplan)
-		ReleaseCachedPlan(cplan, plan->saved);
+	{
+		if (is_fragile)
+		{
+			LocalTransactionId after_lxid  = MyProc->lxid;;
+
+			/*
+			 * Release plan, when the transaction is same. When
+			 * there are new transaction, then cached plan was
+			 * released by resource owner.
+			 */
+			if (before_lxid == after_lxid)
+				ReleaseCachedPlan(cplan, plan->saved);
+
+			FinalReleaseFragileCachedPlan(cplan);
+		}
+		else
+			ReleaseCachedPlan(cplan, plan->saved);
+	}
 
 	/*
 	 * Pop the error context stack
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 75b475c179..de3ed036f2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -528,10 +528,14 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
 	if (plansource->gplan)
 	{
 		CachedPlan *plan = plansource->gplan;
+		bool		is_fragile = plan->is_fragile;
 
 		Assert(plan->magic == CACHEDPLAN_MAGIC);
 		plansource->gplan = NULL;
 		ReleaseCachedPlan(plan, false);
+
+		if (is_fragile)
+			FinalReleaseFragileCachedPlan(plan);
 	}
 }
 
@@ -1265,9 +1269,12 @@ ReleaseCachedPlan(CachedPlan *plan, bool useResOwner)
 		Assert(plan->is_saved);
 		ResourceOwnerForgetPlanCacheRef(CurrentResourceOwner, plan);
 	}
+
 	Assert(plan->refcount > 0);
 	plan->refcount--;
-	if (plan->refcount == 0)
+
+	/* We don't want to destroy fragile plans immediately */
+	if (plan->refcount == 0 && !plan->is_fragile)
 	{
 		/* Mark it no longer valid */
 		plan->magic = 0;
@@ -1278,6 +1285,23 @@ ReleaseCachedPlan(CachedPlan *plan, bool useResOwner)
 	}
 }
 
+void
+FinalReleaseFragileCachedPlan(CachedPlan *plan)
+{
+
+	Assert(plan->magic == CACHEDPLAN_MAGIC);
+	Assert(plan->is_fragile);
+
+	if (plan->refcount == 0)
+	{
+		plan->magic = 0;
+
+		/* One-shot plans do not own their context, so we can't free them */
+		if (!plan->is_oneshot)
+			MemoryContextDelete(plan->context);
+	}
+}
+
 /*
  * CachedPlanAllowsSimpleValidityCheck: can we use CachedPlanIsSimplyValid?
  *
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 8bc2c4e9ea..b328729ebd 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -635,7 +635,13 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 		{
 			CachedPlan *res = (CachedPlan *) DatumGetPointer(foundres);
 
-			if (isCommit)
+			/*
+			 * The transaction can be ended inside execution of CALL statements.
+			 * In this case, the caller cannot to clean own resources, and then
+			 * cleaning by resource owner is expected. Unfortunatelly a caller
+			 * routine doesn't know if this situation will be or not.
+			 */
+			if (isCommit && !res->is_fragile)
 				PrintPlanCacheLeakWarning(res);
 			ReleaseCachedPlan(res, true);
 		}
@@ -1140,7 +1146,7 @@ ResourceOwnerRememberPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
 void
 ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
 {
-	if (!ResourceArrayRemove(&(owner->planrefarr), PointerGetDatum(plan)))
+	if (!ResourceArrayRemove(&(owner->planrefarr), PointerGetDatum(plan)) && !plan->is_fragile)
 		elog(ERROR, "plancache reference %p is not owned by resource owner %s",
 			 plan, owner->name);
 }
diff --git a/src/include/executor/spi_priv.h b/src/include/executor/spi_priv.h
index 6220928bd3..e1a01ecc0b 100644
--- a/src/include/executor/spi_priv.h
+++ b/src/include/executor/spi_priv.h
@@ -93,6 +93,7 @@ typedef struct _SPI_plan
 	bool		saved;			/* saved or unsaved plan? */
 	bool		oneshot;		/* one-shot plan? */
 	bool		no_snapshots;	/* let the caller handle the snapshots */
+	bool		fragile;		/* plancache can be released by resource owner */
 	List	   *plancache_list; /* one CachedPlanSource per parsetree */
 	MemoryContext plancxt;		/* Context containing _SPI_plan and data */
 	int			cursor_options; /* Cursor options used for planning */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 522020d763..d657eb0f17 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -150,6 +150,7 @@ typedef struct CachedPlan
 	bool		is_oneshot;		/* is it a "oneshot" plan? */
 	bool		is_saved;		/* is CachedPlan in a long-lived context? */
 	bool		is_valid;		/* is the stmt_list currently valid? */
+	bool		is_fragile;		/* cleaning by resource owner is expected */
 	Oid			planRoleId;		/* Role ID the plan was created for */
 	bool		dependsOnRole;	/* is plan specific to that role? */
 	TransactionId saved_xmin;	/* if valid, replan when TransactionXmin
@@ -222,6 +223,8 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
 								 QueryEnvironment *queryEnv);
 extern void ReleaseCachedPlan(CachedPlan *plan, bool useResOwner);
 
+extern void FinalReleaseFragileCachedPlan(CachedPlan *plan);
+
 extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
 												CachedPlan *plan,
 												ResourceOwner owner);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..0f2ba6b0f6 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2171,7 +2171,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 * XXX This would be fixable with some plancache/resowner surgery
 			 * elsewhere, but for now we'll just work around this here.
 			 */
-			exec_prepare_plan(estate, expr, 0, estate->atomic);
+			exec_prepare_plan(estate, expr, 0, true);
 
 			/*
 			 * The procedure call could end transactions, which would upset
@@ -2181,6 +2181,15 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			plan = expr->plan;
 			plan->no_snapshots = true;
 
+			/*
+			 * Plancache have to be released on end of tranaction. But for this
+			 * case, transaction can be ended somewhere inside CALL statements
+			 * chain. We cannot to know where and why. So we should to mark
+			 * this plan as fragile. In this case is possible, there are
+			 * more actors are responsible to cleaning plan cache.
+			 */
+			plan->fragile = true;
+
 			/*
 			 * Force target to be recalculated whenever the plan changes, in
 			 * case the procedure's argument list has changed.

plpgsql-stmt_call-fix-1.patchtext/x-patch; charset=US-ASCII; name=plpgsql-stmt_call-fix-1.patchDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..22c0376795 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2150,15 +2150,16 @@ static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
 	PLpgSQL_expr *expr = stmt->expr;
+	volatile SPIPlanPtr	plan = expr->plan;
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile int rc;
+	volatile bool plan_owner = false;
 
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
 	{
-		SPIPlanPtr	plan = expr->plan;
 		ParamListInfo paramLI;
 
 		if (plan == NULL)
@@ -2173,6 +2174,16 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 */
 			exec_prepare_plan(estate, expr, 0, estate->atomic);
 
+			/*
+			 * Not saved plan should be explicitly released. Without it, any
+			 * not recursive usage of CALL statemenst leak plan in SPI memory.
+			 * The created plan can be reused when procedure is called recursively,
+			 * and releasing plan can be done only in recursion root call, when
+			 * expression has not assigned plan. Where a plan was created, then
+			 * there plan can be released.
+			 */
+			plan_owner = true;
+
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
@@ -2319,14 +2330,30 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 * If we aren't saving the plan, unset the pointer.  Note that it
 		 * could have been unset already, in case of a recursive call.
 		 */
-		if (expr->plan && !expr->plan->saved)
+		if (plan && !plan->saved)
+		{
+			if (plan_owner)
+				SPI_freeplan(plan);
+
 			expr->plan = NULL;
+		}
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
 
-	if (expr->plan && !expr->plan->saved)
+	if (plan && !plan->saved)
+	{
+		if (plan_owner)
+			SPI_freeplan(plan);
+
+		/*
+		 * If expr->plan is present, it must be the same plan that we
+		 * allocated
+		 */
+		Assert (!expr->plan || plan == expr->plan);
+
 		expr->plan = NULL;
+	}
 
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",

#23

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Pavel Stehule (#22)

Re: calling procedures is slow and consumes extra much memory against calling function

On Thu, Jul 16, 2020 at 09:08:09PM +0200, Pavel Stehule wrote:

I am sending another patch that tries to allow CachedPlans for CALL
statements. I think this patch is very accurate, but it is not nice,
because it is smudging very precious reference counting for CachedPlans.

Amit, you are registered as a reviewer of this patch for two months
now. Are you planning to look at it more? If you are not planning to
do so, that's fine, but it may be better to remove your name as
reviewer then.
--
Michael

#24

Amit Khandekar

amitdkhan.pg@gmail.com

over 5 years ago

In reply to: Michael Paquier (#23)

Re: calling procedures is slow and consumes extra much memory against calling function

On Thu, 17 Sep 2020 at 11:07, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jul 16, 2020 at 09:08:09PM +0200, Pavel Stehule wrote:

I am sending another patch that tries to allow CachedPlans for CALL
statements. I think this patch is very accurate, but it is not nice,
because it is smudging very precious reference counting for CachedPlans.

Amit, you are registered as a reviewer of this patch for two months
now. Are you planning to look at it more? If you are not planning to
do so, that's fine, but it may be better to remove your name as
reviewer then.

Thanks Michael for reminding. I *had* actually planned to do some more
review. But I think I might end up not getting bandwidth for this one
during this month. So I have removed my name. But I have kept my name
as reviewer for bitmaps and correlation :
"https://commitfest.postgresql.org/29/2310/ since I do plan to do some
review on that one.

Thanks,
-Amit Khandekar
Huawei Technologies

#25

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Pavel Stehule (#22)

1 attachment(s)

Re: calling procedures is slow and consumes extra much memory against calling function

Pavel Stehule <pavel.stehule@gmail.com> writes:

I am sending another patch that tries to allow CachedPlans for CALL
statements. I think this patch is very accurate, but it is not nice,
because it is smudging very precious reference counting for CachedPlans.

I spent some time testing this. Although the #1 patch gets rid of
the major memory leak of cached plans, the original test case still
shows a pretty substantial leak across repeated executions of a CALL.
The reason is that the stanza for rebuilding stmt->target also gets
executed each time through, and that leaks not only the relatively
small PLpgSQL_row datum but also a bunch of catalog lookup cruft
created on the way to building the datum. Basically this code forgot
that plpgsql's outer execution layer can't assume that it's running
in a short-lived context.

I attach a revised #1 that takes care of that problem, and also
cleans up what seems to me to be pretty sloppy thinking in both
the original code and Pavel's #1 patch: we should be restoring
the previous value of expr->plan, not cavalierly assuming that
it was necessarily NULL. I didn't care for looking at the plan's
"saved" field to decide what was happening, either. We really
should have a local flag variable clearly defining which behavior
it is that we're implementing.

With this patch, I see zero memory bloat on Pavel's original example,
even with a much larger repeat count.

I don't like much of anything about plpgsql-stmt_call-fix-2.patch.
It feels confused and poorly documented, possibly because "fragile"
is not a very clear term for whatever property it is you're trying to
attribute to plans. But in any case, I think it's fixing the problem
in the wrong place. I think the right way to fix it probably is to
manage a CALL's saved plan the same as every other plpgsql plan,
but arrange for the transient refcount on that plan to be held by a
ResourceOwner that is not a child of any transaction resowner, but
rather belongs to the procedure's execution and will be released on
the way out of the procedure.

In any case, I doubt we'd risk back-patching either the #2 patch
or any other approach to avoiding the repeat planning. We need a
back-patchable fix that at least tamps down the memory bloat,
and this seems like it'd do.

regards, tom lane

Attachments:

v2-plpgsql-stmt_call-fix-1.patchtext/x-diff; charset=us-ascii; name=v2-plpgsql-stmt_call-fix-1.patchDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..7a2f2dfe91 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2145,40 +2145,60 @@ exec_stmt_perform(PLpgSQL_execstate *estate, PLpgSQL_stmt_perform *stmt)
 
 /*
  * exec_stmt_call
+ *
+ * NOTE: this is used for both CALL and DO statements.
  */
 static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
 	PLpgSQL_expr *expr = stmt->expr;
+	SPIPlanPtr	orig_plan = expr->plan;
+	bool		local_plan;
+	PLpgSQL_variable *volatile cur_target = stmt->target;
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile int rc;
 
+	/*
+	 * If not in atomic context, we make a local plan that we'll just use for
+	 * this invocation, and will free at the end.  Otherwise, transaction ends
+	 * would cause errors about plancache leaks.
+	 *
+	 * XXX This would be fixable with some plancache/resowner surgery
+	 * elsewhere, but for now we'll just work around this here.
+	 */
+	local_plan = !estate->atomic;
+
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
 	{
 		SPIPlanPtr	plan = expr->plan;
 		ParamListInfo paramLI;
 
-		if (plan == NULL)
+		/*
+		 * Make a plan if we don't have one, or if we need a local one.  Note
+		 * that we'll overwrite expr->plan either way; the PG_TRY block will
+		 * ensure we undo that on the way out, if the plan is local.
+		 */
+		if (plan == NULL || local_plan)
 		{
+			/* Don't let SPI save the plan if it's going to be local */
+			exec_prepare_plan(estate, expr, 0, !local_plan);
+			plan = expr->plan;
 
 			/*
-			 * Don't save the plan if not in atomic context.  Otherwise,
-			 * transaction ends would cause errors about plancache leaks.
-			 *
-			 * XXX This would be fixable with some plancache/resowner surgery
-			 * elsewhere, but for now we'll just work around this here.
+			 * A CALL should never be a simple expression.  (If it could be,
+			 * we'd have to worry about saving/restoring the previous values
+			 * of the related expr fields, not just expr->plan.)
 			 */
-			exec_prepare_plan(estate, expr, 0, estate->atomic);
+			Assert(!expr->expr_simple_expr);
 
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
 			 * Instead, we set the snapshots ourselves below.
 			 */
-			plan = expr->plan;
 			plan->no_snapshots = true;
 
 			/*
@@ -2186,14 +2206,20 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 * case the procedure's argument list has changed.
 			 */
 			stmt->target = NULL;
+			cur_target = NULL;
 		}
 
 		/*
 		 * We construct a DTYPE_ROW datum representing the plpgsql variables
 		 * associated with the procedure's output arguments.  Then we can use
-		 * exec_move_row() to do the assignments.
+		 * exec_move_row() to do the assignments.  If we're using a local
+		 * plan, also make a local target; otherwise, since the above code
+		 * will force a new plan each time through, we'd repeatedly leak the
+		 * memory for the target.  (Note: we also leak the target when a plan
+		 * change is forced, but that isn't so likely to cause excessive
+		 * memory leaks.)
 		 */
-		if (stmt->is_call && stmt->target == NULL)
+		if (stmt->is_call && cur_target == NULL)
 		{
 			Node	   *node;
 			FuncExpr   *funcexpr;
@@ -2208,6 +2234,9 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			int			i;
 			ListCell   *lc;
 
+			/* Use eval_mcontext for any cruft accumulated here */
+			oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+
 			/*
 			 * Get the parsed CallStmt, and look up the called procedure
 			 */
@@ -2239,9 +2268,11 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			ReleaseSysCache(func_tuple);
 
 			/*
-			 * Begin constructing row Datum
+			 * Begin constructing row Datum; keep it in fn_cxt if it's to be
+			 * long-lived.
 			 */
-			oldcontext = MemoryContextSwitchTo(estate->func->fn_cxt);
+			if (!local_plan)
+				MemoryContextSwitchTo(estate->func->fn_cxt);
 
 			row = (PLpgSQL_row *) palloc0(sizeof(PLpgSQL_row));
 			row->dtype = PLPGSQL_DTYPE_ROW;
@@ -2249,7 +2280,8 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->lineno = -1;
 			row->varnos = (int *) palloc(sizeof(int) * list_length(funcargs));
 
-			MemoryContextSwitchTo(oldcontext);
+			if (!local_plan)
+				MemoryContextSwitchTo(get_eval_mcontext(estate));
 
 			/*
 			 * Examine procedure's argument list.  Each output arg position
@@ -2293,7 +2325,13 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 
 			row->nfields = nfields;
 
-			stmt->target = (PLpgSQL_variable *) row;
+			cur_target = (PLpgSQL_variable *) row;
+
+			/* We can save and re-use the target datum, if it's not local */
+			if (!local_plan)
+				stmt->target = cur_target;
+
+			MemoryContextSwitchTo(oldcontext);
 		}
 
 		paramLI = setup_param_list(estate, expr);
@@ -2316,17 +2354,27 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	PG_CATCH();
 	{
 		/*
-		 * If we aren't saving the plan, unset the pointer.  Note that it
-		 * could have been unset already, in case of a recursive call.
+		 * If we are using a local plan, restore the old plan link.
 		 */
-		if (expr->plan && !expr->plan->saved)
-			expr->plan = NULL;
+		if (local_plan)
+			expr->plan = orig_plan;
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
 
-	if (expr->plan && !expr->plan->saved)
-		expr->plan = NULL;
+	/*
+	 * If we are using a local plan, restore the old plan link; then free the
+	 * local plan to avoid memory leaks.  (Note that the error exit path above
+	 * just clears the link without risking calling SPI_freeplan; we expect
+	 * that xact cleanup will take care of the mess in that case.)
+	 */
+	if (local_plan)
+	{
+		SPIPlanPtr	plan = expr->plan;
+
+		expr->plan = orig_plan;
+		SPI_freeplan(plan);
+	}
 
 	if (rc < 0)
 		elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",
@@ -2363,10 +2411,10 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	{
 		SPITupleTable *tuptab = SPI_tuptable;
 
-		if (!stmt->target)
+		if (!cur_target)
 			elog(ERROR, "DO statement returned a row");
 
-		exec_move_row(estate, stmt->target, tuptab->vals[0], tuptab->tupdesc);
+		exec_move_row(estate, cur_target, tuptab->vals[0], tuptab->tupdesc);
 	}
 	else if (SPI_processed > 1)
 		elog(ERROR, "procedure call returned more than one row");

#26

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#25)

Re: calling procedures is slow and consumes extra much memory against calling function

po 28. 9. 2020 v 3:04 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I am sending another patch that tries to allow CachedPlans for CALL
statements. I think this patch is very accurate, but it is not nice,
because it is smudging very precious reference counting for CachedPlans.

I spent some time testing this. Although the #1 patch gets rid of
the major memory leak of cached plans, the original test case still
shows a pretty substantial leak across repeated executions of a CALL.
The reason is that the stanza for rebuilding stmt->target also gets
executed each time through, and that leaks not only the relatively
small PLpgSQL_row datum but also a bunch of catalog lookup cruft
created on the way to building the datum. Basically this code forgot
that plpgsql's outer execution layer can't assume that it's running
in a short-lived context.

I attach a revised #1 that takes care of that problem, and also
cleans up what seems to me to be pretty sloppy thinking in both
the original code and Pavel's #1 patch: we should be restoring
the previous value of expr->plan, not cavalierly assuming that
it was necessarily NULL. I didn't care for looking at the plan's
"saved" field to decide what was happening, either. We really
should have a local flag variable clearly defining which behavior
it is that we're implementing.

With this patch, I see zero memory bloat on Pavel's original example,
even with a much larger repeat count.

I don't like much of anything about plpgsql-stmt_call-fix-2.patch.
It feels confused and poorly documented, possibly because "fragile"
is not a very clear term for whatever property it is you're trying to
attribute to plans. But in any case, I think it's fixing the problem
in the wrong place. I think the right way to fix it probably is to
manage a CALL's saved plan the same as every other plpgsql plan,
but arrange for the transient refcount on that plan to be held by a
ResourceOwner that is not a child of any transaction resowner, but
rather belongs to the procedure's execution and will be released on
the way out of the procedure.

In any case, I doubt we'd risk back-patching either the #2 patch
or any other approach to avoiding the repeat planning. We need a
back-patchable fix that at least tamps down the memory bloat,
and this seems like it'd do.

I agree with these conclusions. I'll try to look if I can do #2 patch
better for pg14. Probably it can fix more issues related to CALL statement,
and I agree so this should not be backapatched.

It can be great to use CALL without memory leaks (and it can be better (in
future) if the performance of CALL statements should be good).

Thank you for enhancing and fixing this patch

Regards

Pavel

Show quoted text

regards, tom lane

#27

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Pavel Stehule (#26)

Re: calling procedures is slow and consumes extra much memory against calling function

Pavel Stehule <pavel.stehule@gmail.com> writes:

I agree with these conclusions. I'll try to look if I can do #2 patch
better for pg14. Probably it can fix more issues related to CALL statement,
and I agree so this should not be backapatched.

I've pushed this and marked the CF entry committed. Please start a
new thread and new CF entry whenever you have a more ambitious patch.

regards, tom lane

#28

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#27)

Re: calling procedures is slow and consumes extra much memory against calling function

út 29. 9. 2020 v 17:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

I agree with these conclusions. I'll try to look if I can do #2 patch
better for pg14. Probably it can fix more issues related to CALL

statement,

and I agree so this should not be backapatched.

I've pushed this and marked the CF entry committed. Please start a
new thread and new CF entry whenever you have a more ambitious patch.

Thank you

Pavel

Show quoted text

regards, tom lane