Inlining of couple of functions in pl_exec.c improves performance

Started by Amit Khandekarover 5 years ago19 messages
#1Amit Khandekar
amitdkhan.pg@gmail.com
5 attachment(s)

There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.

exec_stmt() :

plpgsql_exec_function() and other toplevel block executors currently
call exec_stmt(). But actually they don't need to do everything that
exec_stmt() does. So they can call a new function instead of
exec_stmt(), and all the exec_stmt() code can be moved to
exec_stmts(). The things that exec_stmt() do, but are not necessary
for a top level block stmt, are :

1. save_estmt = estate->err_stmt; estate->err_stmt = stmt;
For top level blocks, saving the estate->err_stmt is not necessary,
because there is no statement after this block statement. Anyways,
plpgsql_exec_function() assigns estate.err_stmt just before calling
exec_stmt so there is really no point in exec_stmt() setting it again.

2. CHECK_FOR_INTERRUPTS()
This is not necessary for toplevel block callers.

3. exec_stmt_block() can be directly called rather than exec_stmt()
because func->action is a block statement. So the switch statement is
not necessary.

But this one might be necessary for toplevel block statement:
if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);

There was already a repetitive code in plpgsql_exec_function() and
other functions around the exec_stmt() call. So in a separate patch
0001*.patch, I moved that code into a common function
exec_toplevel_block(). In the main patch
0002-Get-rid-of-exec_stmt-function-call.patch, I additionally called
plpgsql_plugin_ptr->stmt_beg() inside exec_toplevel_block(). And moved
exec_stmt() code into exec_stmts().

exec_cast_value() :

This function does not do the casting if not required. So moved the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline. Attached is the
0003-Inline-exec_cast_value.patch

Testing
----------

I used two available VMs (one x86_64 and the other arm64), and the
benefit showed up on both of these machines. Attached patches 0001,
0002, 0003 are to be applied in that order. 0001 is just a preparatory
patch.

First I tried with a simple for loop with a single assignment
(attached forcounter.sql)

By inlining of the two functions, found noticeable reduction in
execution time as shown (figures are in milliseconds, averaged over
multiple runs; taken from 'explain analyze' execution times) :
ARM VM :
HEAD : 100 ; Patched : 88 => 13.6% improvement
x86 VM :
HEAD : 71 ; Patched : 66 => 7.63% improvement.

Then I included many assignment statements as shown in attachment
assignmany.sql. This showed further benefit :
ARM VM :
HEAD : 1820 ; Patched : 1549 => 17.5% improvement
x86 VM :
HEAD : 1020 ; Patched : 869 => 17.4% improvement

Inlining just exec_stmt() showed the improvement mainly on the arm64
VM (7.4%). For x86, it was 2.7%
But inlining exec_stmt() and exec_cast_value() together showed
benefits on both machines, as can be seen above.

--
Thanks,
-Amit Khandekar
Huawei Technologies

Attachments:

0001-Modularize-code-in-toplevel-pl-pgsql-block-callers.patchtext/x-patch; charset=US-ASCII; name=0001-Modularize-code-in-toplevel-pl-pgsql-block-callers.patchDownload
From 66c607ef6f0b7b655819b4b19383e024c5f8788c Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amitdkhan.pg@gmail.com>
Date: Sat, 23 May 2020 21:39:41 +0800
Subject: [PATCH 1/3] Modularize code in toplevel pl/pgsql block callers

Functions that call exec_stmt() for executing the toplevel block have
a repetitive code that is now moved into a common function
exec_toplevel_block(). This in preparation for further changes in
this part of code.
---
 src/pl/plpgsql/src/pl_exec.c | 86 ++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 48 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 9a87cd70f1..0a70ceddbb 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -267,6 +267,9 @@ static MemoryContext get_stmt_mcontext(PLpgSQL_execstate *estate);
 static void push_stmt_mcontext(PLpgSQL_execstate *estate);
 static void pop_stmt_mcontext(PLpgSQL_execstate *estate);
 
+static void exec_toplevel_block(PLpgSQL_execstate *estate,
+								PLpgSQL_stmt_block *block,
+								char *object_type, char *err_text);
 static int	exec_stmt_block(PLpgSQL_execstate *estate,
 							PLpgSQL_stmt_block *block);
 static int	exec_stmts(PLpgSQL_execstate *estate,
@@ -475,7 +478,6 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
 	PLpgSQL_execstate estate;
 	ErrorContextCallback plerrcontext;
 	int			i;
-	int			rc;
 
 	/*
 	 * Setup the execution state
@@ -605,23 +607,8 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
 	/*
 	 * Now call the toplevel block of statements
 	 */
-	estate.err_text = NULL;
-	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
-	if (rc != PLPGSQL_RC_RETURN)
-	{
-		estate.err_stmt = NULL;
-		estate.err_text = NULL;
-		ereport(ERROR,
-				(errcode(ERRCODE_S_R_E_FUNCTION_EXECUTED_NO_RETURN_STATEMENT),
-				 errmsg("control reached end of function without RETURN")));
-	}
-
-	/*
-	 * We got a return value - process it
-	 */
-	estate.err_stmt = NULL;
-	estate.err_text = gettext_noop("while casting return value to function's return type");
+	exec_toplevel_block(&estate, func->action, "function",
+		gettext_noop("while casting return value to function's return type"));
 
 	fcinfo->isnull = estate.retisnull;
 
@@ -909,7 +896,7 @@ plpgsql_exec_trigger(PLpgSQL_function *func,
 {
 	PLpgSQL_execstate estate;
 	ErrorContextCallback plerrcontext;
-	int			rc;
+	int			rc PG_USED_FOR_ASSERTS_ONLY;
 	TupleDesc	tupdesc;
 	PLpgSQL_rec *rec_new,
 			   *rec_old;
@@ -1021,20 +1008,8 @@ plpgsql_exec_trigger(PLpgSQL_function *func,
 	/*
 	 * Now call the toplevel block of statements
 	 */
-	estate.err_text = NULL;
-	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
-	if (rc != PLPGSQL_RC_RETURN)
-	{
-		estate.err_stmt = NULL;
-		estate.err_text = NULL;
-		ereport(ERROR,
-				(errcode(ERRCODE_S_R_E_FUNCTION_EXECUTED_NO_RETURN_STATEMENT),
-				 errmsg("control reached end of trigger procedure without RETURN")));
-	}
-
-	estate.err_stmt = NULL;
-	estate.err_text = gettext_noop("during function exit");
+	exec_toplevel_block(&estate, func->action, "trigger procedure",
+						gettext_noop("during function exit"));
 
 	if (estate.retisset)
 		ereport(ERROR,
@@ -1151,7 +1126,6 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata)
 {
 	PLpgSQL_execstate estate;
 	ErrorContextCallback plerrcontext;
-	int			rc;
 
 	/*
 	 * Setup the execution state
@@ -1182,20 +1156,8 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata)
 	/*
 	 * Now call the toplevel block of statements
 	 */
-	estate.err_text = NULL;
-	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
-	if (rc != PLPGSQL_RC_RETURN)
-	{
-		estate.err_stmt = NULL;
-		estate.err_text = NULL;
-		ereport(ERROR,
-				(errcode(ERRCODE_S_R_E_FUNCTION_EXECUTED_NO_RETURN_STATEMENT),
-				 errmsg("control reached end of trigger procedure without RETURN")));
-	}
-
-	estate.err_stmt = NULL;
-	estate.err_text = gettext_noop("during function exit");
+	exec_toplevel_block(&estate, func->action, "trigger procedure",
+						gettext_noop("during function exit"));
 
 	/*
 	 * Let the instrumentation plugin peek at this function
@@ -1591,6 +1553,34 @@ exception_matches_conditions(ErrorData *edata, PLpgSQL_condition *cond)
 }
 
 
+/* ----------
+ * exec_toplevel_block			Execute the toplevel block
+ * ----------
+ */
+static void
+exec_toplevel_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block,
+					char *object_type, char *err_text)
+{
+	int			rc;
+
+	estate->err_text = NULL;
+	estate->err_stmt = (PLpgSQL_stmt *)block;
+
+	rc = exec_stmt(estate, (PLpgSQL_stmt *)block);
+
+	if (rc != PLPGSQL_RC_RETURN)
+	{
+		estate->err_stmt = NULL;
+		estate->err_text = NULL;
+		ereport(ERROR,
+				(errcode(ERRCODE_S_R_E_FUNCTION_EXECUTED_NO_RETURN_STATEMENT),
+				 errmsg("control reached end of %s without RETURN", object_type)));
+	}
+
+	estate->err_stmt = NULL;
+	estate->err_text = err_text;
+}
+
 /* ----------
  * exec_stmt_block			Execute a block of statements
  * ----------
-- 
2.17.1

0002-Get-rid-of-exec_stmt-function-call.patchtext/x-patch; charset=US-ASCII; name=0002-Get-rid-of-exec_stmt-function-call.patchDownload
From ff9906d78c5c59c296b7a9ecfefd8715f29a0fe0 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amitdkhan.pg@gmail.com>
Date: Sat, 23 May 2020 21:44:28 +0800
Subject: [PATCH 2/3] Get rid of exec_stmt() function call

Reduce exec_stmt() callers and move its code into exec_stmts().
All other callers of exec_stmt() were toplevel block executors.
These don't need to exec_stmt(); they can directly call
exec_stmt_block().

This reduces funtion call overhead because exec_stmt() is called
frequently. This is shown to obtain as much as 7% performance
benefit on some systems.
---
 src/pl/plpgsql/src/pl_exec.c | 236 +++++++++++++++++------------------
 1 file changed, 114 insertions(+), 122 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 0a70ceddbb..d5377a6dad 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -274,8 +274,6 @@ static int	exec_stmt_block(PLpgSQL_execstate *estate,
 							PLpgSQL_stmt_block *block);
 static int	exec_stmts(PLpgSQL_execstate *estate,
 					   List *stmts);
-static int	exec_stmt(PLpgSQL_execstate *estate,
-					  PLpgSQL_stmt *stmt);
 static int	exec_stmt_assign(PLpgSQL_execstate *estate,
 							 PLpgSQL_stmt_assign *stmt);
 static int	exec_stmt_perform(PLpgSQL_execstate *estate,
@@ -1566,7 +1564,15 @@ exec_toplevel_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block,
 	estate->err_text = NULL;
 	estate->err_stmt = (PLpgSQL_stmt *)block;
 
-	rc = exec_stmt(estate, (PLpgSQL_stmt *)block);
+	/* Let the plugin know that we are about to execute this statement */
+	if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
+		((*plpgsql_plugin_ptr)->stmt_beg) (estate, (PLpgSQL_stmt *)block);
+
+	rc = exec_stmt_block(estate, block);
+
+	/* Let the plugin know that we have finished executing this statement */
+	if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end)
+		((*plpgsql_plugin_ptr)->stmt_end) (estate, (PLpgSQL_stmt *)block);
 
 	if (rc != PLPGSQL_RC_RETURN)
 	{
@@ -1930,162 +1936,148 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts)
 	foreach(s, stmts)
 	{
 		PLpgSQL_stmt *stmt = (PLpgSQL_stmt *) lfirst(s);
-		int			rc = exec_stmt(estate, stmt);
+		PLpgSQL_stmt *save_estmt;
+		int			rc = -1;
 
-		if (rc != PLPGSQL_RC_OK)
-			return rc;
-	}
-
-	return PLPGSQL_RC_OK;
-}
+		save_estmt = estate->err_stmt;
+		estate->err_stmt = stmt;
 
+		/* Let the plugin know that we are about to execute this statement */
+		if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
+			((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);
 
-/* ----------
- * exec_stmt			Distribute one statement to the statements
- *				type specific execution function.
- * ----------
- */
-static int
-exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt)
-{
-	PLpgSQL_stmt *save_estmt;
-	int			rc = -1;
+		CHECK_FOR_INTERRUPTS();
 
-	save_estmt = estate->err_stmt;
-	estate->err_stmt = stmt;
+		switch (stmt->cmd_type)
+		{
+			case PLPGSQL_STMT_BLOCK:
+				rc = exec_stmt_block(estate, (PLpgSQL_stmt_block *) stmt);
+				break;
 
-	/* Let the plugin know that we are about to execute this statement */
-	if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
-		((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);
+			case PLPGSQL_STMT_ASSIGN:
+				rc = exec_stmt_assign(estate, (PLpgSQL_stmt_assign *) stmt);
+				break;
 
-	CHECK_FOR_INTERRUPTS();
+			case PLPGSQL_STMT_PERFORM:
+				rc = exec_stmt_perform(estate, (PLpgSQL_stmt_perform *) stmt);
+				break;
 
-	switch (stmt->cmd_type)
-	{
-		case PLPGSQL_STMT_BLOCK:
-			rc = exec_stmt_block(estate, (PLpgSQL_stmt_block *) stmt);
-			break;
+			case PLPGSQL_STMT_CALL:
+				rc = exec_stmt_call(estate, (PLpgSQL_stmt_call *) stmt);
+				break;
 
-		case PLPGSQL_STMT_ASSIGN:
-			rc = exec_stmt_assign(estate, (PLpgSQL_stmt_assign *) stmt);
-			break;
+			case PLPGSQL_STMT_GETDIAG:
+				rc = exec_stmt_getdiag(estate, (PLpgSQL_stmt_getdiag *) stmt);
+				break;
 
-		case PLPGSQL_STMT_PERFORM:
-			rc = exec_stmt_perform(estate, (PLpgSQL_stmt_perform *) stmt);
-			break;
+			case PLPGSQL_STMT_IF:
+				rc = exec_stmt_if(estate, (PLpgSQL_stmt_if *) stmt);
+				break;
 
-		case PLPGSQL_STMT_CALL:
-			rc = exec_stmt_call(estate, (PLpgSQL_stmt_call *) stmt);
-			break;
+			case PLPGSQL_STMT_CASE:
+				rc = exec_stmt_case(estate, (PLpgSQL_stmt_case *) stmt);
+				break;
 
-		case PLPGSQL_STMT_GETDIAG:
-			rc = exec_stmt_getdiag(estate, (PLpgSQL_stmt_getdiag *) stmt);
-			break;
+			case PLPGSQL_STMT_LOOP:
+				rc = exec_stmt_loop(estate, (PLpgSQL_stmt_loop *) stmt);
+				break;
 
-		case PLPGSQL_STMT_IF:
-			rc = exec_stmt_if(estate, (PLpgSQL_stmt_if *) stmt);
-			break;
+			case PLPGSQL_STMT_WHILE:
+				rc = exec_stmt_while(estate, (PLpgSQL_stmt_while *) stmt);
+				break;
 
-		case PLPGSQL_STMT_CASE:
-			rc = exec_stmt_case(estate, (PLpgSQL_stmt_case *) stmt);
-			break;
+			case PLPGSQL_STMT_FORI:
+				rc = exec_stmt_fori(estate, (PLpgSQL_stmt_fori *) stmt);
+				break;
 
-		case PLPGSQL_STMT_LOOP:
-			rc = exec_stmt_loop(estate, (PLpgSQL_stmt_loop *) stmt);
-			break;
+			case PLPGSQL_STMT_FORS:
+				rc = exec_stmt_fors(estate, (PLpgSQL_stmt_fors *) stmt);
+				break;
 
-		case PLPGSQL_STMT_WHILE:
-			rc = exec_stmt_while(estate, (PLpgSQL_stmt_while *) stmt);
-			break;
+			case PLPGSQL_STMT_FORC:
+				rc = exec_stmt_forc(estate, (PLpgSQL_stmt_forc *) stmt);
+				break;
 
-		case PLPGSQL_STMT_FORI:
-			rc = exec_stmt_fori(estate, (PLpgSQL_stmt_fori *) stmt);
-			break;
+			case PLPGSQL_STMT_FOREACH_A:
+				rc = exec_stmt_foreach_a(estate, (PLpgSQL_stmt_foreach_a *) stmt);
+				break;
 
-		case PLPGSQL_STMT_FORS:
-			rc = exec_stmt_fors(estate, (PLpgSQL_stmt_fors *) stmt);
-			break;
+			case PLPGSQL_STMT_EXIT:
+				rc = exec_stmt_exit(estate, (PLpgSQL_stmt_exit *) stmt);
+				break;
 
-		case PLPGSQL_STMT_FORC:
-			rc = exec_stmt_forc(estate, (PLpgSQL_stmt_forc *) stmt);
-			break;
+			case PLPGSQL_STMT_RETURN:
+				rc = exec_stmt_return(estate, (PLpgSQL_stmt_return *) stmt);
+				break;
 
-		case PLPGSQL_STMT_FOREACH_A:
-			rc = exec_stmt_foreach_a(estate, (PLpgSQL_stmt_foreach_a *) stmt);
-			break;
+			case PLPGSQL_STMT_RETURN_NEXT:
+				rc = exec_stmt_return_next(estate, (PLpgSQL_stmt_return_next *) stmt);
+				break;
 
-		case PLPGSQL_STMT_EXIT:
-			rc = exec_stmt_exit(estate, (PLpgSQL_stmt_exit *) stmt);
-			break;
+			case PLPGSQL_STMT_RETURN_QUERY:
+				rc = exec_stmt_return_query(estate, (PLpgSQL_stmt_return_query *) stmt);
+				break;
 
-		case PLPGSQL_STMT_RETURN:
-			rc = exec_stmt_return(estate, (PLpgSQL_stmt_return *) stmt);
-			break;
+			case PLPGSQL_STMT_RAISE:
+				rc = exec_stmt_raise(estate, (PLpgSQL_stmt_raise *) stmt);
+				break;
 
-		case PLPGSQL_STMT_RETURN_NEXT:
-			rc = exec_stmt_return_next(estate, (PLpgSQL_stmt_return_next *) stmt);
-			break;
+			case PLPGSQL_STMT_ASSERT:
+				rc = exec_stmt_assert(estate, (PLpgSQL_stmt_assert *) stmt);
+				break;
 
-		case PLPGSQL_STMT_RETURN_QUERY:
-			rc = exec_stmt_return_query(estate, (PLpgSQL_stmt_return_query *) stmt);
-			break;
+			case PLPGSQL_STMT_EXECSQL:
+				rc = exec_stmt_execsql(estate, (PLpgSQL_stmt_execsql *) stmt);
+				break;
 
-		case PLPGSQL_STMT_RAISE:
-			rc = exec_stmt_raise(estate, (PLpgSQL_stmt_raise *) stmt);
-			break;
+			case PLPGSQL_STMT_DYNEXECUTE:
+				rc = exec_stmt_dynexecute(estate, (PLpgSQL_stmt_dynexecute *) stmt);
+				break;
 
-		case PLPGSQL_STMT_ASSERT:
-			rc = exec_stmt_assert(estate, (PLpgSQL_stmt_assert *) stmt);
-			break;
+			case PLPGSQL_STMT_DYNFORS:
+				rc = exec_stmt_dynfors(estate, (PLpgSQL_stmt_dynfors *) stmt);
+				break;
 
-		case PLPGSQL_STMT_EXECSQL:
-			rc = exec_stmt_execsql(estate, (PLpgSQL_stmt_execsql *) stmt);
-			break;
+			case PLPGSQL_STMT_OPEN:
+				rc = exec_stmt_open(estate, (PLpgSQL_stmt_open *) stmt);
+				break;
 
-		case PLPGSQL_STMT_DYNEXECUTE:
-			rc = exec_stmt_dynexecute(estate, (PLpgSQL_stmt_dynexecute *) stmt);
-			break;
+			case PLPGSQL_STMT_FETCH:
+				rc = exec_stmt_fetch(estate, (PLpgSQL_stmt_fetch *) stmt);
+				break;
 
-		case PLPGSQL_STMT_DYNFORS:
-			rc = exec_stmt_dynfors(estate, (PLpgSQL_stmt_dynfors *) stmt);
-			break;
+			case PLPGSQL_STMT_CLOSE:
+				rc = exec_stmt_close(estate, (PLpgSQL_stmt_close *) stmt);
+				break;
 
-		case PLPGSQL_STMT_OPEN:
-			rc = exec_stmt_open(estate, (PLpgSQL_stmt_open *) stmt);
-			break;
+			case PLPGSQL_STMT_COMMIT:
+				rc = exec_stmt_commit(estate, (PLpgSQL_stmt_commit *) stmt);
+				break;
 
-		case PLPGSQL_STMT_FETCH:
-			rc = exec_stmt_fetch(estate, (PLpgSQL_stmt_fetch *) stmt);
-			break;
+			case PLPGSQL_STMT_ROLLBACK:
+				rc = exec_stmt_rollback(estate, (PLpgSQL_stmt_rollback *) stmt);
+				break;
 
-		case PLPGSQL_STMT_CLOSE:
-			rc = exec_stmt_close(estate, (PLpgSQL_stmt_close *) stmt);
-			break;
+			case PLPGSQL_STMT_SET:
+				rc = exec_stmt_set(estate, (PLpgSQL_stmt_set *) stmt);
+				break;
 
-		case PLPGSQL_STMT_COMMIT:
-			rc = exec_stmt_commit(estate, (PLpgSQL_stmt_commit *) stmt);
-			break;
+			default:
+				estate->err_stmt = save_estmt;
+				elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type);
+		}
 
-		case PLPGSQL_STMT_ROLLBACK:
-			rc = exec_stmt_rollback(estate, (PLpgSQL_stmt_rollback *) stmt);
-			break;
+		/* Let the plugin know that we have finished executing this statement */
+		if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end)
+			((*plpgsql_plugin_ptr)->stmt_end) (estate, stmt);
 
-		case PLPGSQL_STMT_SET:
-			rc = exec_stmt_set(estate, (PLpgSQL_stmt_set *) stmt);
-			break;
+		estate->err_stmt = save_estmt;
 
-		default:
-			estate->err_stmt = save_estmt;
-			elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type);
+		if (rc != PLPGSQL_RC_OK)
+			return rc;
 	}
 
-	/* Let the plugin know that we have finished executing this statement */
-	if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end)
-		((*plpgsql_plugin_ptr)->stmt_end) (estate, stmt);
-
-	estate->err_stmt = save_estmt;
-
-	return rc;
+	return PLPGSQL_RC_OK;
 }
 
 
-- 
2.17.1

0003-Inline-exec_cast_value.patchtext/x-patch; charset=US-ASCII; name=0003-Inline-exec_cast_value.patchDownload
From 56aac7dff8243ff6dc9b8e72651cb1d9a018f1b3 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amitdkhan.pg@gmail.com>
Date: Sat, 23 May 2020 21:53:24 +0800
Subject: [PATCH 3/3] Inline exec_cast_value().

This function does not do the casting if not required. So move the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline.

There are frequent calls of this function, so inlining it has shown to
improve performance by as much as 14%
---
 src/pl/plpgsql/src/pl_exec.c | 63 ++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d5377a6dad..4028a3f0f6 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -425,10 +425,14 @@ static void instantiate_empty_record_variable(PLpgSQL_execstate *estate,
 											  PLpgSQL_rec *rec);
 static char *convert_value_to_string(PLpgSQL_execstate *estate,
 									 Datum value, Oid valtype);
-static Datum exec_cast_value(PLpgSQL_execstate *estate,
+static inline Datum exec_cast_value(PLpgSQL_execstate *estate,
 							 Datum value, bool *isnull,
 							 Oid valtype, int32 valtypmod,
 							 Oid reqtype, int32 reqtypmod);
+static Datum do_cast_value(PLpgSQL_execstate *estate,
+				Datum value, bool *isnull,
+				Oid valtype, int32 valtypmod,
+				Oid reqtype, int32 reqtypmod);
 static plpgsql_CastHashEntry *get_cast_hashentry(PLpgSQL_execstate *estate,
 												 Oid srctype, int32 srctypmod,
 												 Oid dsttype, int32 dsttypmod);
@@ -7764,9 +7768,11 @@ convert_value_to_string(PLpgSQL_execstate *estate, Datum value, Oid valtype)
  * also contain the result Datum if we have to do a conversion to a pass-
  * by-reference data type.  Be sure to do an exec_eval_cleanup() call when
  * done with the result.
+ * The actual code to cast is kept outside of this function, to keep it short
+ * since it is an inline function, being called frequently.
  * ----------
  */
-static Datum
+static inline Datum
 exec_cast_value(PLpgSQL_execstate *estate,
 				Datum value, bool *isnull,
 				Oid valtype, int32 valtypmod,
@@ -7777,31 +7783,48 @@ exec_cast_value(PLpgSQL_execstate *estate,
 	 */
 	if (valtype != reqtype ||
 		(valtypmod != reqtypmod && reqtypmod != -1))
-	{
-		plpgsql_CastHashEntry *cast_entry;
+		value = do_cast_value(estate, value, isnull, valtype, valtypmod,
+							  reqtype, reqtypmod);
 
-		cast_entry = get_cast_hashentry(estate,
-										valtype, valtypmod,
-										reqtype, reqtypmod);
-		if (cast_entry)
-		{
-			ExprContext *econtext = estate->eval_econtext;
-			MemoryContext oldcontext;
+	return value;
+}
 
-			oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+/* ----------
+ * do_cast_value			cast the input value.
+ *
+ * Returns the cast value.
+ * Check comments in the wrapper function exec_cast_value().
+ * ----------
+ */
+static Datum
+do_cast_value(PLpgSQL_execstate *estate,
+				Datum value, bool *isnull,
+				Oid valtype, int32 valtypmod,
+				Oid reqtype, int32 reqtypmod)
+{
+	plpgsql_CastHashEntry *cast_entry;
 
-			econtext->caseValue_datum = value;
-			econtext->caseValue_isNull = *isnull;
+	cast_entry = get_cast_hashentry(estate,
+									valtype, valtypmod,
+									reqtype, reqtypmod);
+	if (cast_entry)
+	{
+		ExprContext *econtext = estate->eval_econtext;
+		MemoryContext oldcontext;
 
-			cast_entry->cast_in_use = true;
+		oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
 
-			value = ExecEvalExpr(cast_entry->cast_exprstate, econtext,
-								 isnull);
+		econtext->caseValue_datum = value;
+		econtext->caseValue_isNull = *isnull;
 
-			cast_entry->cast_in_use = false;
+		cast_entry->cast_in_use = true;
 
-			MemoryContextSwitchTo(oldcontext);
-		}
+		value = ExecEvalExpr(cast_entry->cast_exprstate, econtext,
+							 isnull);
+
+		cast_entry->cast_in_use = false;
+
+		MemoryContextSwitchTo(oldcontext);
 	}
 
 	return value;
-- 
2.17.1

forcounter.sqlapplication/sql; name=forcounter.sqlDownload
assignmany.sqlapplication/sql; name=assignmany.sqlDownload
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Amit Khandekar (#1)
Re: Inlining of couple of functions in pl_exec.c improves performance

Hi

so 23. 5. 2020 v 19:03 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.

exec_stmt() :

plpgsql_exec_function() and other toplevel block executors currently
call exec_stmt(). But actually they don't need to do everything that
exec_stmt() does. So they can call a new function instead of
exec_stmt(), and all the exec_stmt() code can be moved to
exec_stmts(). The things that exec_stmt() do, but are not necessary
for a top level block stmt, are :

1. save_estmt = estate->err_stmt; estate->err_stmt = stmt;
For top level blocks, saving the estate->err_stmt is not necessary,
because there is no statement after this block statement. Anyways,
plpgsql_exec_function() assigns estate.err_stmt just before calling
exec_stmt so there is really no point in exec_stmt() setting it again.

2. CHECK_FOR_INTERRUPTS()
This is not necessary for toplevel block callers.

3. exec_stmt_block() can be directly called rather than exec_stmt()
because func->action is a block statement. So the switch statement is
not necessary.

But this one might be necessary for toplevel block statement:
if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);

There was already a repetitive code in plpgsql_exec_function() and
other functions around the exec_stmt() call. So in a separate patch
0001*.patch, I moved that code into a common function
exec_toplevel_block(). In the main patch
0002-Get-rid-of-exec_stmt-function-call.patch, I additionally called
plpgsql_plugin_ptr->stmt_beg() inside exec_toplevel_block(). And moved
exec_stmt() code into exec_stmts().

exec_cast_value() :

This function does not do the casting if not required. So moved the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline. Attached is the
0003-Inline-exec_cast_value.patch

Testing
----------

I used two available VMs (one x86_64 and the other arm64), and the
benefit showed up on both of these machines. Attached patches 0001,
0002, 0003 are to be applied in that order. 0001 is just a preparatory
patch.

First I tried with a simple for loop with a single assignment
(attached forcounter.sql)

By inlining of the two functions, found noticeable reduction in
execution time as shown (figures are in milliseconds, averaged over
multiple runs; taken from 'explain analyze' execution times) :
ARM VM :
HEAD : 100 ; Patched : 88 => 13.6% improvement
x86 VM :
HEAD : 71 ; Patched : 66 => 7.63% improvement.

Then I included many assignment statements as shown in attachment
assignmany.sql. This showed further benefit :
ARM VM :
HEAD : 1820 ; Patched : 1549 => 17.5% improvement
x86 VM :
HEAD : 1020 ; Patched : 869 => 17.4% improvement

Inlining just exec_stmt() showed the improvement mainly on the arm64
VM (7.4%). For x86, it was 2.7%
But inlining exec_stmt() and exec_cast_value() together showed
benefits on both machines, as can be seen above.

FOR counter IN 1..1800000 LOOP
id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0; id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0;
END LOOP;

This is not too much typical PLpgSQL code. All expressions are not
parametrized - so this test is little bit obscure.

Last strange performance plpgsql benchmark did calculation of pi value. It
does something real

Regards

Pavel

Show quoted text

--
Thanks,
-Amit Khandekar
Huawei Technologies

#3Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Pavel Stehule (#2)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote:

FOR counter IN 1..1800000 LOOP
id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0; id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0;
END LOOP;

This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure.

Last strange performance plpgsql benchmark did calculation of pi value. It does something real

Yeah, basically I wanted to have many statements, and that too with
many assignments where casts are not required. Let me check if I can
come up with a real-enough testcase. Thanks.

#4Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Khandekar (#3)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Tue, 26 May 2020 at 09:06, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote:

FOR counter IN 1..1800000 LOOP
id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0; id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0;
END LOOP;

This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure.

Last strange performance plpgsql benchmark did calculation of pi value. It does something real

Yeah, basically I wanted to have many statements, and that too with
many assignments where casts are not required. Let me check if I can
come up with a real-enough testcase. Thanks.

create table tab (id int[]);
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));

-- Return how much two consecutive array elements are apart from each
other, on average; i.e. how much the numbers are spaced out.
-- Input is an ordered array of integers.
CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$
DECLARE
diff int = 0;
num int;
prevnum int = 1;
BEGIN
FOREACH num IN ARRAY $1
LOOP
diff = diff + num - prevnum;
prevnum = num;
END LOOP;
RETURN diff/array_length($1, 1);
END;
$$ LANGUAGE plpgsql;

explain analyze select avg_space(id) from tab;
Like earlier figures, these are execution times in milliseconds, taken
from explain-analyze.
ARM VM:
HEAD : 49.8
patch 0001+0002 : 47.8 => 4.2%
patch 0001+0002+0003 : 42.9 => 16.1%
x86 VM:
HEAD : 32.8
patch 0001+0002 : 32.7 => 0%
patch 0001+0002+0003 : 28.0 => 17.1%

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Amit Khandekar (#4)
Re: Inlining of couple of functions in pl_exec.c improves performance

Hi

st 27. 5. 2020 v 13:31 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Tue, 26 May 2020 at 09:06, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:

On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com>

wrote:

FOR counter IN 1..1800000 LOOP
id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0; id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0;
END LOOP;

This is not too much typical PLpgSQL code. All expressions are not

parametrized - so this test is little bit obscure.

Last strange performance plpgsql benchmark did calculation of pi

value. It does something real

Yeah, basically I wanted to have many statements, and that too with
many assignments where casts are not required. Let me check if I can
come up with a real-enough testcase. Thanks.

create table tab (id int[]);
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));

-- Return how much two consecutive array elements are apart from each
other, on average; i.e. how much the numbers are spaced out.
-- Input is an ordered array of integers.
CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$
DECLARE
diff int = 0;
num int;
prevnum int = 1;
BEGIN
FOREACH num IN ARRAY $1
LOOP
diff = diff + num - prevnum;
prevnum = num;
END LOOP;
RETURN diff/array_length($1, 1);
END;
$$ LANGUAGE plpgsql;

explain analyze select avg_space(id) from tab;
Like earlier figures, these are execution times in milliseconds, taken
from explain-analyze.
ARM VM:
HEAD : 49.8
patch 0001+0002 : 47.8 => 4.2%
patch 0001+0002+0003 : 42.9 => 16.1%
x86 VM:
HEAD : 32.8
patch 0001+0002 : 32.7 => 0%
patch 0001+0002+0003 : 28.0 => 17.1%

I tested these patches on my notebook - Lenovo T520 (x64) - on pi
calculation

CREATE OR REPLACE FUNCTION pi_est_1(n int)
RETURNS numeric AS $$
DECLARE
accum double precision DEFAULT 1.0;
c1 double precision DEFAULT 2.0;
c2 double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..n
LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0)));
c1 := c1 + 2.0;
c2 := c2 + 2.0;
END LOOP;
RETURN accum * 2.0;
END;
$$ LANGUAGE plpgsql;

and I see about 3-5% of speedup

extra simply test shows

do $$ declare i int default 0; begin while i < 100000000 loop i := i + 1;
end loop; raise notice 'i=%', i;end $$;

2% speedup

I don't see 17% anywhere, but 3-5% is not bad.

patch 0001 has sense and can help with code structure
patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks
structure it is correct.
patch 0003 has sense too

tested on Fedora 32 with gcc 10.1.1 and -O2 option

Regards

Pavel

#6Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Pavel Stehule (#5)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Thu, 28 May 2020 at 14:39, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I don't see 17% anywhere, but 3-5% is not bad.

Did you see 3-5% only for the pi function, or did you see the same
improvement also for the functions that I wrote ? I was getting a
consistent result of 14-18 % on both of the VMs. Also, is your test
machine running on Windows ? All the machines I tested were on Linux
kernel (Ubuntu)

Below are my results for your pi_est_1() function. For this function,
I am consistently getting 5-9 % improvement. I tested on 3 machines :

gcc : 8.4.0. -O2 option
OS : Ubuntu Bionic

explain analyze select pi_est_1(10000000)

1. x86_64 laptop VM (Intel Core i7-8665U)
HEAD : 2666 2617 2600 2630
Patched : 2502 2409 2460 2444

2. x86_64 VM (Xeon Gold 6151)
HEAD : 1664 1662 1721 1660
Patched : 1541 1548 1537 1526

3. ARM64 VM (Kunpeng)
HEAD : 2873 2864 2860 2861
Patched : 2568 2513 2501 2538

patch 0001 has sense and can help with code structure
patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.

Here, I moved the exec_stmt code into exec_stmts() function because
exec_stmts() was the only caller, and that function is not that big. I
am assuming you were referring to this point when you said it is a bit
against simplicity. But I didn't get what you implied by "but for
PLpgSQL with blocks structure it is correct"

--
Thanks,
-Amit Khandekar
Huawei Technologies

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Amit Khandekar (#6)
Re: Inlining of couple of functions in pl_exec.c improves performance

so 30. 5. 2020 v 7:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Thu, 28 May 2020 at 14:39, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

I don't see 17% anywhere, but 3-5% is not bad.

Did you see 3-5% only for the pi function, or did you see the same
improvement also for the functions that I wrote ? I was getting a
consistent result of 14-18 % on both of the VMs. Also, is your test
machine running on Windows ? All the machines I tested were on Linux
kernel (Ubuntu)

It was similar with your example too.

I tested it on Linux Fedora Core 32 - laptop T520 - I7.

I think so the effect of these patches strongly depends on CPU and compiler
- but it is micro optimization, and when I look to profiler, the bottle
neck is elsewhere.

Below are my results for your pi_est_1() function. For this function,
I am consistently getting 5-9 % improvement. I tested on 3 machines :

gcc : 8.4.0. -O2 option
OS : Ubuntu Bionic

explain analyze select pi_est_1(10000000)

1. x86_64 laptop VM (Intel Core i7-8665U)
HEAD : 2666 2617 2600 2630
Patched : 2502 2409 2460 2444

2. x86_64 VM (Xeon Gold 6151)
HEAD : 1664 1662 1721 1660
Patched : 1541 1548 1537 1526

3. ARM64 VM (Kunpeng)
HEAD : 2873 2864 2860 2861
Patched : 2568 2513 2501 2538

patch 0001 has sense and can help with code structure
patch 0002 it is little bit against simplicity, but for PLpgSQL with

blocks structure it is correct.

Here, I moved the exec_stmt code into exec_stmts() function because
exec_stmts() was the only caller, and that function is not that big. I
am assuming you were referring to this point when you said it is a bit
against simplicity. But I didn't get what you implied by "but for
PLpgSQL with blocks structure it is correct"

Nested statement in PLpgSQL is always a list of statements. It is not
single statement ever. So is not too strange don't have a function
execute_stmt.

Pavel

Show quoted text

--
Thanks,
-Amit Khandekar
Huawei Technologies

#8Michael Paquier
michael@paquier.xyz
In reply to: Amit Khandekar (#1)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Sat, May 23, 2020 at 10:33:43PM +0530, Amit Khandekar wrote:

By inlining of the two functions, found noticeable reduction in
execution time as shown (figures are in milliseconds, averaged over
multiple runs; taken from 'explain analyze' execution times) :
ARM VM :
HEAD : 100 ; Patched : 88 => 13.6% improvement
x86 VM :
HEAD : 71 ; Patched : 66 => 7.63% improvement.

Then I included many assignment statements as shown in attachment
assignmany.sql. This showed further benefit :
ARM VM :
HEAD : 1820 ; Patched : 1549 => 17.5% improvement
x86 VM :
HEAD : 1020 ; Patched : 869 => 17.4% improvement

Inlining just exec_stmt() showed the improvement mainly on the arm64
VM (7.4%). For x86, it was 2.7%
But inlining exec_stmt() and exec_cast_value() together showed
benefits on both machines, as can be seen above.

This stuff is interesting. Do you have some perf profiles to share?
I am wondering what's the effect of the inlining with your test
cases.
--
Michael

#9Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Michael Paquier (#8)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Sun, 31 May 2020 at 08:04, Michael Paquier <michael@paquier.xyz> wrote:

This stuff is interesting. Do you have some perf profiles to share?
I am wondering what's the effect of the inlining with your test
cases.

Below are the perf numbers for asignmany.sql :

HEAD :

+   16.88%  postgres  postgres           [.] CachedPlanIsSimplyValid
+   16.64%  postgres  plpgsql.so         [.] exec_stmt
+   15.56%  postgres  plpgsql.so         [.] exec_eval_expr
+   13.58%  postgres  plpgsql.so         [.] exec_assign_value
+    7.49%  postgres  plpgsql.so         [.] exec_cast_value
+    7.17%  postgres  plpgsql.so         [.] exec_assign_expr
+    5.39%  postgres  postgres           [.] MemoryContextReset
+    3.91%  postgres  postgres           [.] ExecJustConst
+    3.33%  postgres  postgres           [.] recomputeNamespacePath
+    2.88%  postgres  postgres           [.] OverrideSearchPathMatchesCurrent
+    2.18%  postgres  plpgsql.so         [.] exec_eval_cleanup.isra.17
+    2.15%  postgres  plpgsql.so         [.] exec_stmts
+    1.32%  postgres  plpgsql.so         [.] MemoryContextReset@plt
+    0.57%  postgres  plpgsql.so         [.] CachedPlanIsSimplyValid@plt
+    0.57%  postgres  postgres           [.] GetUserId
     0.30%  postgres  plpgsql.so         [.] assign_simple_var.isra.13
     0.05%  postgres  [kernel.kallsyms]  [k] unmap_page_range

Patched :

+   18.22%  postgres  postgres           [.] CachedPlanIsSimplyValid
+   17.25%  postgres  plpgsql.so         [.] exec_eval_expr
+   16.31%  postgres  plpgsql.so         [.] exec_stmts
+   15.00%  postgres  plpgsql.so         [.] exec_assign_value
+    7.56%  postgres  plpgsql.so         [.] exec_assign_expr
+    5.64%  postgres  postgres           [.] MemoryContextReset
+    5.16%  postgres  postgres           [.] ExecJustConst
+    4.86%  postgres  postgres           [.] recomputeNamespacePath
+    4.54%  postgres  postgres           [.] OverrideSearchPathMatchesCurrent
+    2.33%  postgres  plpgsql.so         [.] exec_eval_cleanup.isra.17
+    1.26%  postgres  plpgsql.so         [.] MemoryContextReset@plt
+    0.81%  postgres  postgres           [.] GetUserId
+    0.71%  postgres  plpgsql.so         [.] CachedPlanIsSimplyValid@plt
     0.26%  postgres  plpgsql.so         [.] assign_simple_var.isra.13
     0.03%  postgres  [kernel.kallsyms]  [k] unmap_page_range
     0.02%  postgres  [kernel.kallsyms]  [k] mark_page_accessed

Notice the reduction in percentages :
HEAD : exec_stmts + exec_stmt = 18.79
Patched : exec_stmts = 16.31

HEAD : exec_assign_value + exec_cast_value : 21.07
Patched : exec_assign_value = 15.00

As expected, reduction of percentage in these two functions caused
other functions like CachedPlanIsSimplyValid() and exec_eval_expr() to
show rise in their percentages.

#10Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Pavel Stehule (#7)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I think so the effect of these patches strongly depends on CPU and compile

I quickly tried pi() with gcc 10 as well, and saw more or less the
same benefit. I think, we are bound to see some differences in the
benefits across architectures, kernels and compilers; but looks like
some benefit is always there.

but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.

Please check the perf numbers in my reply to Michael. I suppose you
meant CachedPlanIsSimplyValid() when you say the bottle neck is
elsewhere ? Yeah, this function is always the hottest spot, which I
recall is being discussed elsewhere. But I always see exec_stmt(),
exec_assign_value as the next functions.

patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.

Here, I moved the exec_stmt code into exec_stmts() function because
exec_stmts() was the only caller, and that function is not that big. I
am assuming you were referring to this point when you said it is a bit
against simplicity. But I didn't get what you implied by "but for
PLpgSQL with blocks structure it is correct"

Nested statement in PLpgSQL is always a list of statements. It is not single statement ever. So is not too strange don't have a function execute_stmt.

Right.

#11Pavel Stehule
pavel.stehule@gmail.com
In reply to: Amit Khandekar (#10)
Re: Inlining of couple of functions in pl_exec.c improves performance

po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

I think so the effect of these patches strongly depends on CPU and

compile

I quickly tried pi() with gcc 10 as well, and saw more or less the
same benefit. I think, we are bound to see some differences in the
benefits across architectures, kernels and compilers; but looks like
some benefit is always there.

but it is micro optimization, and when I look to profiler, the bottle

neck is elsewhere.

Please check the perf numbers in my reply to Michael. I suppose you
meant CachedPlanIsSimplyValid() when you say the bottle neck is
elsewhere ? Yeah, this function is always the hottest spot, which I
recall is being discussed elsewhere. But I always see exec_stmt(),
exec_assign_value as the next functions.

It is hard to read the profile result, because these functions are nested
together. For your example

18.22% postgres postgres [.] CachedPlanIsSimplyValid

Is little bit strange, and probably this is real bottleneck in your simple
example, and maybe some work can be done there, because you assign just
constant.

On second hand, your example is pretty unrealistic - and against any
developer's best practices for writing cycles.

I think so we can look on PostGIS, where is some computing heavy routines
in PLpgSQL, and we can look on real profiles.

Probably the most people will have benefit from these optimization.

Show quoted text

patch 0002 it is little bit against simplicity, but for PLpgSQL with

blocks structure it is correct.

Here, I moved the exec_stmt code into exec_stmts() function because
exec_stmts() was the only caller, and that function is not that big. I
am assuming you were referring to this point when you said it is a bit
against simplicity. But I didn't get what you implied by "but for
PLpgSQL with blocks structure it is correct"

Nested statement in PLpgSQL is always a list of statements. It is not

single statement ever. So is not too strange don't have a function
execute_stmt.

Right.

#12Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Pavel Stehule (#11)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Mon, 1 Jun 2020 at 12:27, Pavel Stehule <pavel.stehule@gmail.com> wrote:

po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:

On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I think so the effect of these patches strongly depends on CPU and compile

I quickly tried pi() with gcc 10 as well, and saw more or less the
same benefit. I think, we are bound to see some differences in the
benefits across architectures, kernels and compilers; but looks like
some benefit is always there.

but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.

Please check the perf numbers in my reply to Michael. I suppose you
meant CachedPlanIsSimplyValid() when you say the bottle neck is
elsewhere ? Yeah, this function is always the hottest spot, which I
recall is being discussed elsewhere. But I always see exec_stmt(),
exec_assign_value as the next functions.

It is hard to read the profile result, because these functions are nested together. For your example

18.22% postgres postgres [.] CachedPlanIsSimplyValid

Is little bit strange, and probably this is real bottleneck in your simple example, and maybe some work can be done there, because you assign just constant.

I had earlier had a quick look on this one. CachedPlanIsSimplyValid()
was, I recall, hitting a hotspot when it tries to access
plansource->search_path (possibly cacheline miss). But didn't get a
chance to further dig on that. For now, i am focusing on these other
functions for which the patches were submitted.

On second hand, your example is pretty unrealistic - and against any developer's best practices for writing cycles.

I think so we can look on PostGIS, where is some computing heavy routines in PLpgSQL, and we can look on real profiles.

Probably the most people will have benefit from these optimization.

I understand it's not a real world example. For generating perf
figures, I had to use an example which amplifies the benefits, so that
the effect of the patches on the perf figures also becomes visible.
Hence, used that example. I had shown the benefits up-thread using a
practical function avg_space(). But the perf figures for that example
were varying a lot.

So below, what I did was : Run the avg_space() ~150 times, and took
perf report. This stabilized the results a bit :

HEAD :
+   16.10%  17.29%  16.82%  postgres  postgres            [.]
ExecInterpExpr
+   13.80%  13.56%  14.49%  postgres  plpgsql.so          [.]
exec_assign_value
+   12.64%  12.10%  12.09%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   12.15%  11.28%  11.05%  postgres  plpgsql.so          [.]
exec_stmt
+   10.81%  10.24%  10.55%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.50%   9.35%   9.37%  postgres  plpgsql.so          [.]
exec_cast_value
.....
+    1.19%   1.06%   1.21%  postgres  plpgsql.so          [.]
exec_stmts
0001+0002 patches applied (i.e. inline exec_stmt) :
+   16.90%  17.20%  16.54%  postgres  postgres            [.]
ExecInterpExpr
+   16.42%  15.37%  15.28%  postgres  plpgsql.so          [.]
exec_assign_value
+   11.34%  11.92%  11.93%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   11.18%  11.86%  10.99%  postgres  plpgsql.so          [.] exec_stmts.part.0
+   10.51%   9.52%  10.61%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.39%   9.48%   9.30%  postgres  plpgsql.so          [.]
exec_cast_value

HEAD : exec_stmts + exec_stmt = ~12.7 %
Patched (0001+0002): exec_stmts = 11.3 %

Just 0003 patch applied (i.e. inline exec_cast_value) :
+   17.00%  16.77%  17.09% postgres  postgres           [.] ExecInterpExpr
+   15.21%  15.64%  15.09% postgres  plpgsql.so         [.] exec_assign_value
+   14.48%  14.06%  13.94% postgres  plpgsql.so         [.] exec_stmt
+   13.26%  13.30%  13.14% postgres  plpgsql.so         [.]
plpgsql_param_eval_var
+   11.48%  11.64%  12.66% postgres  plpgsql.so         [.] exec_eval_expr
....
+    1.03%   0.85%   0.87% postgres  plpgsql.so         [.] exec_stmts

HEAD : exec_assign_value + exec_cast_value = ~23.4 %
Patched (0001+0002): exec_assign_value = 15.3%

Time in milliseconds after calling avg_space() 150 times :
HEAD : 7210
Patch 0001+0002 : 6925
Patch 0003 : 6670
Patch 0001+0002+0003 : 6346

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Amit Khandekar (#12)
Re: Inlining of couple of functions in pl_exec.c improves performance

po 1. 6. 2020 v 15:59 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>
napsal:

On Mon, 1 Jun 2020 at 12:27, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com>

napsal:

On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com>

wrote:

I think so the effect of these patches strongly depends on CPU and

compile

I quickly tried pi() with gcc 10 as well, and saw more or less the
same benefit. I think, we are bound to see some differences in the
benefits across architectures, kernels and compilers; but looks like
some benefit is always there.

but it is micro optimization, and when I look to profiler, the bottle

neck is elsewhere.

Please check the perf numbers in my reply to Michael. I suppose you
meant CachedPlanIsSimplyValid() when you say the bottle neck is
elsewhere ? Yeah, this function is always the hottest spot, which I
recall is being discussed elsewhere. But I always see exec_stmt(),
exec_assign_value as the next functions.

It is hard to read the profile result, because these functions are

nested together. For your example

18.22% postgres postgres [.] CachedPlanIsSimplyValid

Is little bit strange, and probably this is real bottleneck in your

simple example, and maybe some work can be done there, because you assign
just constant.

I had earlier had a quick look on this one. CachedPlanIsSimplyValid()
was, I recall, hitting a hotspot when it tries to access
plansource->search_path (possibly cacheline miss). But didn't get a
chance to further dig on that. For now, i am focusing on these other
functions for which the patches were submitted.

On second hand, your example is pretty unrealistic - and against any

developer's best practices for writing cycles.

I think so we can look on PostGIS, where is some computing heavy

routines in PLpgSQL, and we can look on real profiles.

Probably the most people will have benefit from these optimization.

I understand it's not a real world example. For generating perf
figures, I had to use an example which amplifies the benefits, so that
the effect of the patches on the perf figures also becomes visible.
Hence, used that example. I had shown the benefits up-thread using a
practical function avg_space(). But the perf figures for that example
were varying a lot.

So below, what I did was : Run the avg_space() ~150 times, and took
perf report. This stabilized the results a bit :

HEAD :
+   16.10%  17.29%  16.82%  postgres  postgres            [.]
ExecInterpExpr
+   13.80%  13.56%  14.49%  postgres  plpgsql.so          [.]
exec_assign_value
+   12.64%  12.10%  12.09%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   12.15%  11.28%  11.05%  postgres  plpgsql.so          [.]
exec_stmt
+   10.81%  10.24%  10.55%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.50%   9.35%   9.37%  postgres  plpgsql.so          [.]
exec_cast_value
.....
+    1.19%   1.06%   1.21%  postgres  plpgsql.so          [.]
exec_stmts
0001+0002 patches applied (i.e. inline exec_stmt) :
+   16.90%  17.20%  16.54%  postgres  postgres            [.]
ExecInterpExpr
+   16.42%  15.37%  15.28%  postgres  plpgsql.so          [.]
exec_assign_value
+   11.34%  11.92%  11.93%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   11.18%  11.86%  10.99%  postgres  plpgsql.so          [.]
exec_stmts.part.0
+   10.51%   9.52%  10.61%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.39%   9.48%   9.30%  postgres  plpgsql.so          [.]
exec_cast_value

HEAD : exec_stmts + exec_stmt = ~12.7 %
Patched (0001+0002): exec_stmts = 11.3 %

Just 0003 patch applied (i.e. inline exec_cast_value) :
+   17.00%  16.77%  17.09% postgres  postgres           [.] ExecInterpExpr
+   15.21%  15.64%  15.09% postgres  plpgsql.so         [.]
exec_assign_value
+   14.48%  14.06%  13.94% postgres  plpgsql.so         [.] exec_stmt
+   13.26%  13.30%  13.14% postgres  plpgsql.so         [.]
plpgsql_param_eval_var
+   11.48%  11.64%  12.66% postgres  plpgsql.so         [.] exec_eval_expr
....
+    1.03%   0.85%   0.87% postgres  plpgsql.so         [.] exec_stmts

HEAD : exec_assign_value + exec_cast_value = ~23.4 %
Patched (0001+0002): exec_assign_value = 15.3%

Time in milliseconds after calling avg_space() 150 times :
HEAD : 7210
Patch 0001+0002 : 6925
Patch 0003 : 6670
Patch 0001+0002+0003 : 6346

Is your patch in commitfest in commitfest application?

Regards

Pavel

#14Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Pavel Stehule (#13)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Tue, 9 Jun 2020 at 21:49, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Is your patch in commitfest in commitfest application?

Thanks for reminding me. Just added.
https://commitfest.postgresql.org/28/2590/

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Khandekar (#1)
2 attachment(s)
Re: Inlining of couple of functions in pl_exec.c improves performance

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.

I took a look at the 0001/0002 patches (not 0003 as yet). I do not
like 0001 too much. The most concrete problem with it is that
you broke translatability of the error messages, cf the first
translatability guideline at [1]https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES. While that could be fixed by passing
the entire message not just part of it, I don't see anything that we're
gaining by moving that stuff into exec_toplevel_block in the first place.
Certainly, passing a string that describes what will happen *after*
exec_toplevel_block is just weird. I think what you've got here is
a very arbitrary chopping-up of the existing code based on chance
similarities of the existing callers. I think we're better off to make
exec_toplevel_block be as nearly as possible a match for exec_stmts'
semantics.

Hence, I propose the attached 0001 to replace 0001/0002. This should
be basically indistinguishable performance-wise, though I have not
tried to benchmark. Note that for reviewability's sake, I did not
reindent the former body of exec_stmt, though we'd want to do that
before committing.

Also, 0002 is a small patch on top of that to avoid redundant saves
and restores of estate->err_stmt within the loop in exec_stmts. This
may well not be a measurable improvement, but it's a pretty obvious
inefficiency in exec_stmts now that it's refactored this way.

regards, tom lane

[1]: https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES

Attachments:

0001-inline-exec_stmt-1.patchtext/x-diff; charset=us-ascii; name=0001-inline-exec_stmt-1.patchDownload
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index f41d675d65..b02567c88d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -260,12 +260,12 @@ static MemoryContext get_stmt_mcontext(PLpgSQL_execstate *estate);
 static void push_stmt_mcontext(PLpgSQL_execstate *estate);
 static void pop_stmt_mcontext(PLpgSQL_execstate *estate);
 
+static int	exec_toplevel_block(PLpgSQL_execstate *estate,
+								PLpgSQL_stmt_block *block);
 static int	exec_stmt_block(PLpgSQL_execstate *estate,
 							PLpgSQL_stmt_block *block);
 static int	exec_stmts(PLpgSQL_execstate *estate,
 					   List *stmts);
-static int	exec_stmt(PLpgSQL_execstate *estate,
-					  PLpgSQL_stmt *stmt);
 static int	exec_stmt_assign(PLpgSQL_execstate *estate,
 							 PLpgSQL_stmt_assign *stmt);
 static int	exec_stmt_perform(PLpgSQL_execstate *estate,
@@ -599,8 +599,7 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
 	 * Now call the toplevel block of statements
 	 */
 	estate.err_text = NULL;
-	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
+	rc = exec_toplevel_block(&estate, func->action);
 	if (rc != PLPGSQL_RC_RETURN)
 	{
 		estate.err_stmt = NULL;
@@ -1015,8 +1014,7 @@ plpgsql_exec_trigger(PLpgSQL_function *func,
 	 * Now call the toplevel block of statements
 	 */
 	estate.err_text = NULL;
-	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
+	rc = exec_toplevel_block(&estate, func->action);
 	if (rc != PLPGSQL_RC_RETURN)
 	{
 		estate.err_stmt = NULL;
@@ -1176,8 +1174,7 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata)
 	 * Now call the toplevel block of statements
 	 */
 	estate.err_text = NULL;
-	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
+	rc = exec_toplevel_block(&estate, func->action);
 	if (rc != PLPGSQL_RC_RETURN)
 	{
 		estate.err_stmt = NULL;
@@ -1584,6 +1581,38 @@ exception_matches_conditions(ErrorData *edata, PLpgSQL_condition *cond)
 }
 
 
+/* ----------
+ * exec_toplevel_block			Execute the toplevel block
+ *
+ * This is intentionally equivalent to executing exec_stmts() with a
+ * list consisting of the one statement.  One tiny difference is that
+ * we do not bother to save and restore estate->err_stmt; the caller
+ * is expected to clear that after we return.
+ * ----------
+ */
+static int
+exec_toplevel_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
+{
+	int			rc;
+
+	estate->err_stmt = (PLpgSQL_stmt *) block;
+
+	/* Let the plugin know that we are about to execute this statement */
+	if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
+		((*plpgsql_plugin_ptr)->stmt_beg) (estate, (PLpgSQL_stmt *) block);
+
+	CHECK_FOR_INTERRUPTS();
+
+	rc = exec_stmt_block(estate, block);
+
+	/* Let the plugin know that we have finished executing this statement */
+	if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end)
+		((*plpgsql_plugin_ptr)->stmt_end) (estate, (PLpgSQL_stmt *) block);
+
+	return rc;
+}
+
+
 /* ----------
  * exec_stmt_block			Execute a block of statements
  * ----------
@@ -1933,24 +1962,6 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts)
 	foreach(s, stmts)
 	{
 		PLpgSQL_stmt *stmt = (PLpgSQL_stmt *) lfirst(s);
-		int			rc = exec_stmt(estate, stmt);
-
-		if (rc != PLPGSQL_RC_OK)
-			return rc;
-	}
-
-	return PLPGSQL_RC_OK;
-}
-
-
-/* ----------
- * exec_stmt			Distribute one statement to the statements
- *				type specific execution function.
- * ----------
- */
-static int
-exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt)
-{
 	PLpgSQL_stmt *save_estmt;
 	int			rc = -1;
 
@@ -2088,7 +2099,11 @@ exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt)
 
 	estate->err_stmt = save_estmt;
 
-	return rc;
+	if (rc != PLPGSQL_RC_OK)
+		return rc;
+	}							/* end of loop over statements */
+
+	return PLPGSQL_RC_OK;
 }
 
 
0002-remove-redundant-err_stmt-changes-1.patchtext/x-diff; charset=us-ascii; name=0002-remove-redundant-err_stmt-changes-1.patchDownload
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b02567c88d..8bc4a7a3d2 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -1946,6 +1946,7 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
 static int
 exec_stmts(PLpgSQL_execstate *estate, List *stmts)
 {
+	PLpgSQL_stmt *save_estmt = estate->err_stmt;
 	ListCell   *s;
 
 	if (stmts == NIL)
@@ -1962,10 +1963,8 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts)
 	foreach(s, stmts)
 	{
 		PLpgSQL_stmt *stmt = (PLpgSQL_stmt *) lfirst(s);
-	PLpgSQL_stmt *save_estmt;
 	int			rc = -1;
 
-	save_estmt = estate->err_stmt;
 	estate->err_stmt = stmt;
 
 	/* Let the plugin know that we are about to execute this statement */
@@ -2097,12 +2096,14 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts)
 	if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end)
 		((*plpgsql_plugin_ptr)->stmt_end) (estate, stmt);
 
-	estate->err_stmt = save_estmt;
-
 	if (rc != PLPGSQL_RC_OK)
+	{
+		estate->err_stmt = save_estmt;
 		return rc;
+	}
 	}							/* end of loop over statements */
 
+	estate->err_stmt = save_estmt;
 	return PLPGSQL_RC_OK;
 }
 
#16Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Tom Lane (#15)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Thu, 2 Jul 2020 at 03:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Khandekar <amitdkhan.pg@gmail.com> writes:

There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.

I took a look at the 0001/0002 patches (not 0003 as yet). I do not
like 0001 too much. The most concrete problem with it is that
you broke translatability of the error messages, cf the first
translatability guideline at [1].

Yeah, I thought we can safely use %s for proper nouns such as "trigger
procedure" or "function" as those would not be translated. But looks
like even if they won't be translated, the difference in word order
among languages might create problems with this.

While that could be fixed by passing
the entire message not just part of it, I don't see anything that we're
gaining by moving that stuff into exec_toplevel_block in the first place.
Certainly, passing a string that describes what will happen *after*
exec_toplevel_block is just weird. I think what you've got here is
a very arbitrary chopping-up of the existing code based on chance
similarities of the existing callers. I think we're better off to make
exec_toplevel_block be as nearly as possible a match for exec_stmts'
semantics.

I thought some of those things that I kept in exec_toplevel_block() do
look like they belong to a top level function. But what you are saying
also makes sense : better to keep it similar to exec_stmts.

Hence, I propose the attached 0001 to replace 0001/0002. This should
be basically indistinguishable performance-wise, though I have not
tried to benchmark.

Thanks for the patches. Yeah, performance-wise it does look similar;
but anyways I tried running, and got similar performance numbers.

Note that for reviewability's sake, I did not
reindent the former body of exec_stmt, though we'd want to do that
before committing.

Right.

Also, 0002 is a small patch on top of that to avoid redundant saves
and restores of estate->err_stmt within the loop in exec_stmts. This
may well not be a measurable improvement, but it's a pretty obvious
inefficiency in exec_stmts now that it's refactored this way.

0002 also makes sense.

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Khandekar (#16)
1 attachment(s)
Re: Inlining of couple of functions in pl_exec.c improves performance

I did some performance testing on 0001+0002 here, and found that
for me, there's basically no change on x86_64 but a win of 2 to 3
percent on aarch64, using Pavel's pi_est_1() as a representative
case for simple plpgsql statements. That squares with your original
results I believe. It's not clear to me whether any of the later
tests in this thread measured these changes in isolation, or only
with 0003 added.

Anyway, that's good enough for me, so I pushed 0001+0002 after a
little bit of additional cosmetic tweaking.

I attach your original 0003 here (it still applies, with some line
offsets). That's just so the cfbot doesn't get confused about what
it's supposed to test now.

regards, tom lane

Attachments:

0003-Inline-exec_cast_value.patchtext/x-diff; charset=us-ascii; name=0003-Inline-exec_cast_value.patchDownload
From 56aac7dff8243ff6dc9b8e72651cb1d9a018f1b3 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amitdkhan.pg@gmail.com>
Date: Sat, 23 May 2020 21:53:24 +0800
Subject: [PATCH 3/3] Inline exec_cast_value().

This function does not do the casting if not required. So move the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline.

There are frequent calls of this function, so inlining it has shown to
improve performance by as much as 14%
---
 src/pl/plpgsql/src/pl_exec.c | 63 ++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d5377a6dad..4028a3f0f6 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -425,10 +425,14 @@ static void instantiate_empty_record_variable(PLpgSQL_execstate *estate,
 											  PLpgSQL_rec *rec);
 static char *convert_value_to_string(PLpgSQL_execstate *estate,
 									 Datum value, Oid valtype);
-static Datum exec_cast_value(PLpgSQL_execstate *estate,
+static inline Datum exec_cast_value(PLpgSQL_execstate *estate,
 							 Datum value, bool *isnull,
 							 Oid valtype, int32 valtypmod,
 							 Oid reqtype, int32 reqtypmod);
+static Datum do_cast_value(PLpgSQL_execstate *estate,
+				Datum value, bool *isnull,
+				Oid valtype, int32 valtypmod,
+				Oid reqtype, int32 reqtypmod);
 static plpgsql_CastHashEntry *get_cast_hashentry(PLpgSQL_execstate *estate,
 												 Oid srctype, int32 srctypmod,
 												 Oid dsttype, int32 dsttypmod);
@@ -7764,9 +7768,11 @@ convert_value_to_string(PLpgSQL_execstate *estate, Datum value, Oid valtype)
  * also contain the result Datum if we have to do a conversion to a pass-
  * by-reference data type.  Be sure to do an exec_eval_cleanup() call when
  * done with the result.
+ * The actual code to cast is kept outside of this function, to keep it short
+ * since it is an inline function, being called frequently.
  * ----------
  */
-static Datum
+static inline Datum
 exec_cast_value(PLpgSQL_execstate *estate,
 				Datum value, bool *isnull,
 				Oid valtype, int32 valtypmod,
@@ -7777,31 +7783,48 @@ exec_cast_value(PLpgSQL_execstate *estate,
 	 */
 	if (valtype != reqtype ||
 		(valtypmod != reqtypmod && reqtypmod != -1))
-	{
-		plpgsql_CastHashEntry *cast_entry;
+		value = do_cast_value(estate, value, isnull, valtype, valtypmod,
+							  reqtype, reqtypmod);
 
-		cast_entry = get_cast_hashentry(estate,
-										valtype, valtypmod,
-										reqtype, reqtypmod);
-		if (cast_entry)
-		{
-			ExprContext *econtext = estate->eval_econtext;
-			MemoryContext oldcontext;
+	return value;
+}
 
-			oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+/* ----------
+ * do_cast_value			cast the input value.
+ *
+ * Returns the cast value.
+ * Check comments in the wrapper function exec_cast_value().
+ * ----------
+ */
+static Datum
+do_cast_value(PLpgSQL_execstate *estate,
+				Datum value, bool *isnull,
+				Oid valtype, int32 valtypmod,
+				Oid reqtype, int32 reqtypmod)
+{
+	plpgsql_CastHashEntry *cast_entry;
 
-			econtext->caseValue_datum = value;
-			econtext->caseValue_isNull = *isnull;
+	cast_entry = get_cast_hashentry(estate,
+									valtype, valtypmod,
+									reqtype, reqtypmod);
+	if (cast_entry)
+	{
+		ExprContext *econtext = estate->eval_econtext;
+		MemoryContext oldcontext;
 
-			cast_entry->cast_in_use = true;
+		oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
 
-			value = ExecEvalExpr(cast_entry->cast_exprstate, econtext,
-								 isnull);
+		econtext->caseValue_datum = value;
+		econtext->caseValue_isNull = *isnull;
 
-			cast_entry->cast_in_use = false;
+		cast_entry->cast_in_use = true;
 
-			MemoryContextSwitchTo(oldcontext);
-		}
+		value = ExecEvalExpr(cast_entry->cast_exprstate, econtext,
+							 isnull);
+
+		cast_entry->cast_in_use = false;
+
+		MemoryContextSwitchTo(oldcontext);
 	}
 
 	return value;
-- 
2.17.1

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: Inlining of couple of functions in pl_exec.c improves performance

I wrote:

I attach your original 0003 here (it still applies, with some line
offsets). That's just so the cfbot doesn't get confused about what
it's supposed to test now.

Pushed that part now, too.

BTW, the first test run I did on this (on x86_64) was actually several
percent *slower* than HEAD. I couldn't reproduce that after restarting
the postmaster; all later tests concurred that there was a speedup.
So I suppose that was just some phase-of-the-moon effect, perhaps caused
by an ASLR-dependent collision of bits of code in processor cache.
Still, that illustrates the difficulty of getting useful, reproducible
improvements when doing this kind of hacking. I tend to think that
most of the time we're better off leaving this to the compiler.

regards, tom lane

#19Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Tom Lane (#17)
Re: Inlining of couple of functions in pl_exec.c improves performance

On Sat, 4 Jul 2020 at 01:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I did some performance testing on 0001+0002 here, and found that
for me, there's basically no change on x86_64 but a win of 2 to 3
percent on aarch64, using Pavel's pi_est_1() as a representative
case for simple plpgsql statements. That squares with your original
results I believe. It's not clear to me whether any of the later
tests in this thread measured these changes in isolation, or only
with 0003 added.

Yeah I had the same observation. 0001+0002 seems to benefit mostly on
aarch64. And 0003 (exec_case_value) benefited both on amd64 and
aarch64.

Anyway, that's good enough for me, so I pushed 0001+0002 after a
little bit of additional cosmetic tweaking.

I attach your original 0003 here (it still applies, with some line
offsets). That's just so the cfbot doesn't get confused about what
it's supposed to test now.

Thanks for pushing all the three !

Thanks,
-Amit Khandekar
Huawei Technologies