dynamic SQL - possible performance regression in 9.2

Started by Pavel Stehuleabout 13 years ago22 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

I rechecked performance of dynamic SQL and it is significantly slower
in 9.2 than 9.1

-- 9.1
postgres=# create or replace function test() returns void as $$ begin
for i in 1..1000000 loop execute 'select 1'; end loop; end $$ language
plpgsql;
CREATE FUNCTION
postgres=# \timing
Timing is on.
postgres=# select test();
test
------

(1 row)

Time: 7652.904 ms
postgres=# select test();
test
------

(1 row)

Time: 7828.025 ms

-- 9.2
postgres=# create or replace function test() returns void as $$ begin
for i in 1..1000000 loop execute 'select 1'; end loop; end $$ language
plpgsql;
CREATE FUNCTION
Time: 59.272 ms
postgres=# select test();
test
------

(1 row)

Time: 11153.646 ms
postgres=# select test();
test
------

(1 row)

Time: 11081.899 ms

This test is synthetic, but it shows so somebody who use dynamic SQL
in triggers (for partitioning) can has slower operations.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Pavel Stehule (#1)
Re: dynamic SQL - possible performance regression in 9.2

On 12/27/12 1:07 AM, Pavel Stehule wrote:

Hello

I rechecked performance of dynamic SQL and it is significantly slower
in 9.2 than 9.1

-- 9.1
postgres=# create or replace function test() returns void as $$ begin
for i in 1..1000000 loop execute 'select 1'; end loop; end $$ language
plpgsql;

I think this is the same as the case discussed at
<CAD4+=qWnGU0qi+iq=EPh6EGPuUnSCYsGDTgKazizEvrGgjo0Sg@mail.gmail.com>.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Peter Eisentraut (#2)
1 attachment(s)
Re: dynamic SQL - possible performance regression in 9.2

On 28.12.2012 23:53, Peter Eisentraut wrote:

On 12/27/12 1:07 AM, Pavel Stehule wrote:

Hello

I rechecked performance of dynamic SQL and it is significantly slower
in 9.2 than 9.1

-- 9.1
postgres=# create or replace function test() returns void as $$ begin
for i in 1..1000000 loop execute 'select 1'; end loop; end $$ language
plpgsql;

I think this is the same as the case discussed at
<CAD4+=qWnGU0qi+iq=EPh6EGPuUnSCYsGDTgKazizEvrGgjo0Sg@mail.gmail.com>.

Yeah, probably so.

As it happens, I just spent a lot of time today narrowing down yet
another report of a regression in 9.2, when running DBT-2:
http://archives.postgresql.org/pgsql-performance/2012-11/msg00007.php.
It looks like that is also caused by the plancache changes. DBT-2
implements the transactions using C functions, which use SPI_execute()
to run all the queries.

It looks like the regression is caused by extra copying of the parse
tree and plan trees. Node-copy-related functions like AllocSetAlloc and
_copy* are high in the profile, They are also high in the 9.1 profile,
but even more so in 9.2.

I hacked together a quick&dirty patch to reduce the copying of
single-shot plans, and was able to buy back much of the regression I was
seeing on DBT-2. Patch attached. But of course, DBT-2 really should be
preparing the queries once with SPI_prepare, and reusing them thereafter.

- Heikki

Attachments:

reduce-copies-in-spi-execute-2.patchtext/x-diff; name=reduce-copies-in-spi-execute-2.patchDownload
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index e3de7f2..0b047d0 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -75,7 +75,7 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString)
 	 * to see the unmodified raw parse tree.
 	 */
 	plansource = CreateCachedPlan(stmt->query, queryString,
-								  CreateCommandTag(stmt->query));
+								  CreateCommandTag(stmt->query), false);
 
 	/* Transform list of TypeNames to array of type OIDs */
 	nargs = list_length(stmt->argtypes);
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 5a11c6f..d7522f1 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -49,7 +49,7 @@ static Portal SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 						 ParamListInfo paramLI, bool read_only);
 
 static void _SPI_prepare_plan(const char *src, SPIPlanPtr plan,
-				  ParamListInfo boundParams);
+							  ParamListInfo boundParams, bool oneshot);
 
 static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 				  Snapshot snapshot, Snapshot crosscheck_snapshot,
@@ -354,7 +354,7 @@ SPI_execute(const char *src, bool read_only, long tcount)
 	plan.magic = _SPI_PLAN_MAGIC;
 	plan.cursor_options = 0;
 
-	_SPI_prepare_plan(src, &plan, NULL);
+	_SPI_prepare_plan(src, &plan, NULL, true);
 
 	res = _SPI_execute_plan(&plan, NULL,
 							InvalidSnapshot, InvalidSnapshot,
@@ -505,7 +505,7 @@ SPI_execute_with_args(const char *src,
 	paramLI = _SPI_convert_params(nargs, argtypes,
 								  Values, Nulls);
 
-	_SPI_prepare_plan(src, &plan, paramLI);
+	_SPI_prepare_plan(src, &plan, paramLI, true);
 
 	res = _SPI_execute_plan(&plan, paramLI,
 							InvalidSnapshot, InvalidSnapshot,
@@ -546,7 +546,7 @@ SPI_prepare_cursor(const char *src, int nargs, Oid *argtypes,
 	plan.parserSetup = NULL;
 	plan.parserSetupArg = NULL;
 
-	_SPI_prepare_plan(src, &plan, NULL);
+	_SPI_prepare_plan(src, &plan, NULL, false);
 
 	/* copy plan to procedure context */
 	result = _SPI_make_plan_non_temp(&plan);
@@ -583,7 +583,7 @@ SPI_prepare_params(const char *src,
 	plan.parserSetup = parserSetup;
 	plan.parserSetupArg = parserSetupArg;
 
-	_SPI_prepare_plan(src, &plan, NULL);
+	_SPI_prepare_plan(src, &plan, NULL, false);
 
 	/* copy plan to procedure context */
 	result = _SPI_make_plan_non_temp(&plan);
@@ -1082,7 +1082,7 @@ SPI_cursor_open_with_args(const char *name,
 	paramLI = _SPI_convert_params(nargs, argtypes,
 								  Values, Nulls);
 
-	_SPI_prepare_plan(src, &plan, paramLI);
+	_SPI_prepare_plan(src, &plan, paramLI, true);
 
 	/* We needn't copy the plan; SPI_cursor_open_internal will do so */
 
@@ -1656,7 +1656,8 @@ spi_printtup(TupleTableSlot *slot, DestReceiver *self)
  * parsing is also left in CurrentMemoryContext.
  */
 static void
-_SPI_prepare_plan(const char *src, SPIPlanPtr plan, ParamListInfo boundParams)
+_SPI_prepare_plan(const char *src, SPIPlanPtr plan, ParamListInfo boundParams,
+				  bool oneshot)
 {
 	List	   *raw_parsetree_list;
 	List	   *plancache_list;
@@ -1688,6 +1689,8 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan, ParamListInfo boundParams)
 		Node	   *parsetree = (Node *) lfirst(list_item);
 		List	   *stmt_list;
 		CachedPlanSource *plansource;
+		MemoryContext querytree_ctx;
+		MemoryContext oldctx;
 
 		/*
 		 * Create the CachedPlanSource before we do parse analysis, since it
@@ -1695,12 +1698,25 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan, ParamListInfo boundParams)
 		 */
 		plansource = CreateCachedPlan(parsetree,
 									  src,
-									  CreateCommandTag(parsetree));
+									  CreateCommandTag(parsetree),
+									  oneshot);
 
 		/*
 		 * Parameter datatypes are driven by parserSetup hook if provided,
 		 * otherwise we use the fixed parameter list.
 		 */
+		oldctx = CurrentMemoryContext;
+		if (oneshot)
+		{
+			querytree_ctx = AllocSetContextCreate(CurrentMemoryContext,
+												  "Short-lived querytree context",
+												  ALLOCSET_SMALL_MINSIZE,
+												  ALLOCSET_SMALL_INITSIZE,
+												  ALLOCSET_DEFAULT_MAXSIZE);
+			MemoryContextSwitchTo(querytree_ctx);
+		}
+		else
+			querytree_ctx = NULL;
 		if (plan->parserSetup != NULL)
 		{
 			Assert(plan->nargs == 0);
@@ -1717,10 +1733,13 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan, ParamListInfo boundParams)
 											   plan->nargs);
 		}
 
+		if (oneshot)
+			MemoryContextSwitchTo(oldctx);
+
 		/* Finish filling in the CachedPlanSource */
 		CompleteCachedPlan(plansource,
 						   stmt_list,
-						   NULL,
+						   querytree_ctx,
 						   plan->argtypes,
 						   plan->nargs,
 						   plan->parserSetup,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f1633f9..1eae238 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1260,7 +1260,7 @@ exec_parse_message(const char *query_string,	/* string to execute */
 		 * Create the CachedPlanSource before we do parse analysis, since it
 		 * needs to see the unmodified raw parse tree.
 		 */
-		psrc = CreateCachedPlan(raw_parse_tree, query_string, commandTag);
+		psrc = CreateCachedPlan(raw_parse_tree, query_string, commandTag, false);
 
 		/*
 		 * Set up a snapshot if parse analysis will need one.
@@ -1312,7 +1312,7 @@ exec_parse_message(const char *query_string,	/* string to execute */
 		/* Empty input string.	This is legal. */
 		raw_parse_tree = NULL;
 		commandTag = NULL;
-		psrc = CreateCachedPlan(raw_parse_tree, query_string, commandTag);
+		psrc = CreateCachedPlan(raw_parse_tree, query_string, commandTag, false);
 		querytree_list = NIL;
 	}
 
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index c42765c..4e5ff2b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -134,7 +134,8 @@ InitPlanCache(void)
 CachedPlanSource *
 CreateCachedPlan(Node *raw_parse_tree,
 				 const char *query_string,
-				 const char *commandTag)
+				 const char *commandTag,
+				 bool oneshot)
 {
 	CachedPlanSource *plansource;
 	MemoryContext source_context;
@@ -183,6 +184,7 @@ CreateCachedPlan(Node *raw_parse_tree,
 	plansource->is_complete = false;
 	plansource->is_saved = false;
 	plansource->is_valid = false;
+	plansource->is_oneshot = oneshot;
 	plansource->generation = 0;
 	plansource->next_saved = NULL;
 	plansource->generic_cost = -1;
@@ -339,6 +341,9 @@ SaveCachedPlan(CachedPlanSource *plansource)
 	Assert(plansource->is_complete);
 	Assert(!plansource->is_saved);
 
+	if (plansource->is_oneshot)
+		elog(ERROR, "one-shot cached plans cannot be reused");
+
 	/*
 	 * In typical use, this function would be called before generating any
 	 * plans from the CachedPlanSource.  If there is a generic plan, moving it
@@ -741,7 +746,27 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 	 * scribbled on by the planner, make one.
 	 */
 	if (qlist == NIL)
-		qlist = (List *) copyObject(plansource->query_list);
+	{
+		/*
+		 * If the caller said this is a one-shot query, we feel free to
+		 * scribble on the query-tree list directly
+		 */
+		if (plansource->is_oneshot)
+			qlist = plansource->query_list;
+		else
+			qlist = (List *) copyObject(plansource->query_list);
+	}
+
+	/*
+	 * Make a dedicated memory context for the CachedPlan and its subsidiary
+	 * data.  It's probably not going to be large, but just in case, use the
+	 * default maxsize parameter.  It's transient for the moment.
+	 */
+	plan_context = AllocSetContextCreate(CurrentMemoryContext,
+										 "CachedPlan",
+										 ALLOCSET_SMALL_MINSIZE,
+										 ALLOCSET_SMALL_INITSIZE,
+										 ALLOCSET_DEFAULT_MAXSIZE);
 
 	/*
 	 * Restore the search_path that was in use when the plan was made. See
@@ -780,6 +805,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 	/*
 	 * Generate the plan.
 	 */
+
+	/*
+	 * If this is a one-shot query, do the planning directly in the
+	 * CachedPlan's context. It's going to be littered by the planning
+	 * process, but we don't need to copy the result to the CachedPlan's
+	 * context. That's a good tradeoff for a single-shot query, as the
+	 * CachedPLan won't live long.
+	 */
+	if (plansource->is_oneshot)
+		MemoryContextSwitchTo(plan_context);
 	plist = pg_plan_queries(qlist, plansource->cursor_options, boundParams);
 
 	/* Clean up SPI state */
@@ -792,23 +827,14 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 	/* Now we can restore current search path */
 	PopOverrideSearchPath();
 
-	/*
-	 * Make a dedicated memory context for the CachedPlan and its subsidiary
-	 * data.  It's probably not going to be large, but just in case, use the
-	 * default maxsize parameter.  It's transient for the moment.
-	 */
-	plan_context = AllocSetContextCreate(CurrentMemoryContext,
-										 "CachedPlan",
-										 ALLOCSET_SMALL_MINSIZE,
-										 ALLOCSET_SMALL_INITSIZE,
-										 ALLOCSET_DEFAULT_MAXSIZE);
+	oldcxt = MemoryContextSwitchTo(plan_context);
 
 	/*
-	 * Copy plan into the new context.
+	 * Copy plan into the new context, if we didn't create it there to begin
+	 * with.
 	 */
-	oldcxt = MemoryContextSwitchTo(plan_context);
-
-	plist = (List *) copyObject(plist);
+	if (!plansource->is_oneshot)
+		plist = (List *) copyObject(plist);
 
 	/*
 	 * Create and fill the CachedPlan struct within the new context.
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 413e846..b6eed19 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -91,6 +91,7 @@ typedef struct CachedPlanSource
 	bool		is_complete;	/* has CompleteCachedPlan been done? */
 	bool		is_saved;		/* has CachedPlanSource been "saved"? */
 	bool		is_valid;		/* is the query_list currently valid? */
+	bool		is_oneshot;
 	int			generation;		/* increments each time we create a plan */
 	/* If CachedPlanSource has been saved, it is a member of a global list */
 	struct CachedPlanSource *next_saved;		/* list link, if so */
@@ -128,7 +129,8 @@ extern void ResetPlanCache(void);
 
 extern CachedPlanSource *CreateCachedPlan(Node *raw_parse_tree,
 				 const char *query_string,
-				 const char *commandTag);
+				 const char *commandTag,
+				 bool is_oneshot);
 extern void CompleteCachedPlan(CachedPlanSource *plansource,
 				   List *querytree_list,
 				   MemoryContext querytree_context,
#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Heikki Linnakangas (#3)
Re: dynamic SQL - possible performance regression in 9.2

Hello

2012/12/28 Heikki Linnakangas <hlinnakangas@vmware.com>:

On 28.12.2012 23:53, Peter Eisentraut wrote:

On 12/27/12 1:07 AM, Pavel Stehule wrote:

Hello

I rechecked performance of dynamic SQL and it is significantly slower
in 9.2 than 9.1

-- 9.1
postgres=# create or replace function test() returns void as $$ begin
for i in 1..1000000 loop execute 'select 1'; end loop; end $$ language
plpgsql;

I think this is the same as the case discussed at
<CAD4+=qWnGU0qi+iq=EPh6EGPuUnSCYsGDTgKazizEvrGgjo0Sg@mail.gmail.com>.

Yeah, probably so.

As it happens, I just spent a lot of time today narrowing down yet another
report of a regression in 9.2, when running DBT-2:
http://archives.postgresql.org/pgsql-performance/2012-11/msg00007.php. It
looks like that is also caused by the plancache changes. DBT-2 implements
the transactions using C functions, which use SPI_execute() to run all the
queries.

It looks like the regression is caused by extra copying of the parse tree
and plan trees. Node-copy-related functions like AllocSetAlloc and _copy*
are high in the profile, They are also high in the 9.1 profile, but even
more so in 9.2.

I hacked together a quick&dirty patch to reduce the copying of single-shot
plans, and was able to buy back much of the regression I was seeing on
DBT-2. Patch attached. But of course, DBT-2 really should be preparing the
queries once with SPI_prepare, and reusing them thereafter.

performance regression is about 30-50%.

You copy_reduce_patch increase speed about 8%

Regards

Pavel

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#3)
Re: dynamic SQL - possible performance regression in 9.2

On 12/28/12 5:11 PM, Heikki Linnakangas wrote:

As it happens, I just spent a lot of time today narrowing down yet
another report of a regression in 9.2, when running DBT-2:
http://archives.postgresql.org/pgsql-performance/2012-11/msg00007.php.
It looks like that is also caused by the plancache changes. DBT-2
implements the transactions using C functions, which use SPI_execute()
to run all the queries.

It looks like the regression is caused by extra copying of the parse
tree and plan trees. Node-copy-related functions like AllocSetAlloc and
_copy* are high in the profile, They are also high in the 9.1 profile,
but even more so in 9.2.

I hacked together a quick&dirty patch to reduce the copying of
single-shot plans, and was able to buy back much of the regression I was
seeing on DBT-2. Patch attached. But of course, DBT-2 really should be
preparing the queries once with SPI_prepare, and reusing them thereafter.

I was recently profiling an application that uses a fair amount of
PL/pgSQL with dynamic queries and also noticed AllocSetAlloc high in the
profile. I was getting suspicious now and compared 9.1 and 9.2
performance: 9.2 is consistently about 3% slower. Your patch doesn't
seem to have a measurable effect, but it might be if I ran the test for
longer.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#5)
Re: dynamic SQL - possible performance regression in 9.2

Peter Eisentraut <peter_e@gmx.net> writes:

On 12/28/12 5:11 PM, Heikki Linnakangas wrote:

It looks like the regression is caused by extra copying of the parse
tree and plan trees. Node-copy-related functions like AllocSetAlloc and
_copy* are high in the profile, They are also high in the 9.1 profile,
but even more so in 9.2.

Hm ... those 9.2 changes were supposed to *improve* performance, and I
believe they did so for code paths where the plan cache is actually
doing something useful. In this path, it's basically useless.

I hacked together a quick&dirty patch to reduce the copying of
single-shot plans, and was able to buy back much of the regression I was
seeing on DBT-2. Patch attached. But of course, DBT-2 really should be
preparing the queries once with SPI_prepare, and reusing them thereafter.

I was recently profiling an application that uses a fair amount of
PL/pgSQL with dynamic queries and also noticed AllocSetAlloc high in the
profile. I was getting suspicious now and compared 9.1 and 9.2
performance: 9.2 is consistently about 3% slower. Your patch doesn't
seem to have a measurable effect, but it might be if I ran the test for
longer.

I'm inclined to think that Heikki's patch doesn't go far enough, if we
want to optimize behavior in this case. What we really want to happen
is that parsing, planning, and execution all happen in the caller's
memory context, with no copying of parse or plan trees at all - and we
could do without overhead such as dependency extraction and invalidation
checking, too. This would make SPI_execute a lot more comparable to the
behavior of exec_simple_query().

So basically plancache.c has got no useful functionality to offer here.

But to avoid having multiple drastically different code paths in spi.c,
it would be nice if we had some sort of "shell" API that would provide
the illusion of using a CachedPlan without any of the overhead that
comes along with actually being able to save or reuse the plan.
Heikki's "oneshot" concept is moving in that direction, but not far
enough IMO. I'm thinking we don't want it to create any new memory
contexts at all, just palloc a CachedPlan object directly in the
caller's memory context and link to the caller-supplied trees.

I'll take a whack at that ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
1 attachment(s)
Re: dynamic SQL - possible performance regression in 9.2

I wrote:

I'm inclined to think that Heikki's patch doesn't go far enough, if we
want to optimize behavior in this case. What we really want to happen
is that parsing, planning, and execution all happen in the caller's
memory context, with no copying of parse or plan trees at all - and we
could do without overhead such as dependency extraction and invalidation
checking, too. This would make SPI_execute a lot more comparable to the
behavior of exec_simple_query().

Here's a draft patch for that. My initial hack at it had a
disadvantage, which was that because no invalidation checking happened,
a SPI_execute query string containing a DDL command (such as ALTER TABLE)
followed by a command affected by the DDL would fail to reparse/replan
the second command properly. (I suspect that Heikki's version had a
related defect, but haven't looked closely.) Now that's not a huge deal
IMO, because in many common cases parse analysis of the second command
would fail anyway. For instance, this has never worked in any PG
release:

do $$ begin execute 'create table foo(f1 int); insert into foo values(1);'; end $$;

However it troubled me that there might be some regression there, and
after a bit of reflection I decided the right fix would be to rearrange
the code in spi.c so that parse analysis of later parsetrees follows
execution of earlier ones. This makes the behavior of SPI_execute()
even more like that of exec_simple_query(), and shouldn't cost anything
noticeable given the other changes here.

I'm not entirely sure about performance of this fix, though. I got
numbers varying between roughly-on-par with 9.1 and 10% slower than 9.1
for Pavel's example, depending on seemingly not-performance-related
rearrangements of the code in spi.c. I think this must be chance
effects of cache line alignment, but it would be good to hear what other
people get, both on Pavel's example and the other ones alluded to.
In any case this seems better than unmodified HEAD, which was 40% slower
than 9.1 for me.

regards, tom lane

Attachments:

oneshot-cached-plans.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 416a2c4f3bbe2132ed6d66f6eeaee9067e915d9b..cb5b84509ce8a89dfa8bf309c9c64ffdfdffe43d 100644
*** a/src/backend/executor/spi.c
--- b/src/backend/executor/spi.c
*************** static int	_SPI_curid = -1;
*** 49,56 ****
  static Portal SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
  						 ParamListInfo paramLI, bool read_only);
  
! static void _SPI_prepare_plan(const char *src, SPIPlanPtr plan,
! 				  ParamListInfo boundParams);
  
  static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
  				  Snapshot snapshot, Snapshot crosscheck_snapshot,
--- 49,57 ----
  static Portal SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
  						 ParamListInfo paramLI, bool read_only);
  
! static void _SPI_prepare_plan(const char *src, SPIPlanPtr plan);
! 
! static void _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan);
  
  static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
  				  Snapshot snapshot, Snapshot crosscheck_snapshot,
*************** SPI_execute(const char *src, bool read_o
*** 355,361 ****
  	plan.magic = _SPI_PLAN_MAGIC;
  	plan.cursor_options = 0;
  
! 	_SPI_prepare_plan(src, &plan, NULL);
  
  	res = _SPI_execute_plan(&plan, NULL,
  							InvalidSnapshot, InvalidSnapshot,
--- 356,362 ----
  	plan.magic = _SPI_PLAN_MAGIC;
  	plan.cursor_options = 0;
  
! 	_SPI_prepare_oneshot_plan(src, &plan);
  
  	res = _SPI_execute_plan(&plan, NULL,
  							InvalidSnapshot, InvalidSnapshot,
*************** SPI_execute_with_args(const char *src,
*** 506,512 ****
  	paramLI = _SPI_convert_params(nargs, argtypes,
  								  Values, Nulls);
  
! 	_SPI_prepare_plan(src, &plan, paramLI);
  
  	res = _SPI_execute_plan(&plan, paramLI,
  							InvalidSnapshot, InvalidSnapshot,
--- 507,513 ----
  	paramLI = _SPI_convert_params(nargs, argtypes,
  								  Values, Nulls);
  
! 	_SPI_prepare_oneshot_plan(src, &plan);
  
  	res = _SPI_execute_plan(&plan, paramLI,
  							InvalidSnapshot, InvalidSnapshot,
*************** SPI_prepare_cursor(const char *src, int 
*** 547,553 ****
  	plan.parserSetup = NULL;
  	plan.parserSetupArg = NULL;
  
! 	_SPI_prepare_plan(src, &plan, NULL);
  
  	/* copy plan to procedure context */
  	result = _SPI_make_plan_non_temp(&plan);
--- 548,554 ----
  	plan.parserSetup = NULL;
  	plan.parserSetupArg = NULL;
  
! 	_SPI_prepare_plan(src, &plan);
  
  	/* copy plan to procedure context */
  	result = _SPI_make_plan_non_temp(&plan);
*************** SPI_prepare_params(const char *src,
*** 584,590 ****
  	plan.parserSetup = parserSetup;
  	plan.parserSetupArg = parserSetupArg;
  
! 	_SPI_prepare_plan(src, &plan, NULL);
  
  	/* copy plan to procedure context */
  	result = _SPI_make_plan_non_temp(&plan);
--- 585,591 ----
  	plan.parserSetup = parserSetup;
  	plan.parserSetupArg = parserSetupArg;
  
! 	_SPI_prepare_plan(src, &plan);
  
  	/* copy plan to procedure context */
  	result = _SPI_make_plan_non_temp(&plan);
*************** SPI_keepplan(SPIPlanPtr plan)
*** 599,605 ****
  {
  	ListCell   *lc;
  
! 	if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || plan->saved)
  		return SPI_ERROR_ARGUMENT;
  
  	/*
--- 600,607 ----
  {
  	ListCell   *lc;
  
! 	if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC ||
! 		plan->saved || plan->oneshot)
  		return SPI_ERROR_ARGUMENT;
  
  	/*
*************** SPI_cursor_open_with_args(const char *na
*** 1083,1089 ****
  	paramLI = _SPI_convert_params(nargs, argtypes,
  								  Values, Nulls);
  
! 	_SPI_prepare_plan(src, &plan, paramLI);
  
  	/* We needn't copy the plan; SPI_cursor_open_internal will do so */
  
--- 1085,1091 ----
  	paramLI = _SPI_convert_params(nargs, argtypes,
  								  Values, Nulls);
  
! 	_SPI_prepare_plan(src, &plan);
  
  	/* We needn't copy the plan; SPI_cursor_open_internal will do so */
  
*************** spi_printtup(TupleTableSlot *slot, DestR
*** 1645,1654 ****
   *
   * At entry, plan->argtypes and plan->nargs (or alternatively plan->parserSetup
   * and plan->parserSetupArg) must be valid, as must plan->cursor_options.
-  * If boundParams isn't NULL then it represents parameter values that are made
-  * available to the planner (as either estimates or hard values depending on
-  * their PARAM_FLAG_CONST marking).  The boundParams had better match the
-  * param type information embedded in the plan!
   *
   * Results are stored into *plan (specifically, plan->plancache_list).
   * Note that the result data is all in CurrentMemoryContext or child contexts
--- 1647,1652 ----
*************** spi_printtup(TupleTableSlot *slot, DestR
*** 1657,1669 ****
   * parsing is also left in CurrentMemoryContext.
   */
  static void
! _SPI_prepare_plan(const char *src, SPIPlanPtr plan, ParamListInfo boundParams)
  {
  	List	   *raw_parsetree_list;
  	List	   *plancache_list;
  	ListCell   *list_item;
  	ErrorContextCallback spierrcontext;
- 	int			cursor_options = plan->cursor_options;
  
  	/*
  	 * Setup error traceback support for ereport()
--- 1655,1666 ----
   * parsing is also left in CurrentMemoryContext.
   */
  static void
! _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
  {
  	List	   *raw_parsetree_list;
  	List	   *plancache_list;
  	ListCell   *list_item;
  	ErrorContextCallback spierrcontext;
  
  	/*
  	 * Setup error traceback support for ereport()
*************** _SPI_prepare_plan(const char *src, SPIPl
*** 1726,1738 ****
  						   plan->nargs,
  						   plan->parserSetup,
  						   plan->parserSetupArg,
! 						   cursor_options,
  						   false);		/* not fixed result */
  
  		plancache_list = lappend(plancache_list, plansource);
  	}
  
  	plan->plancache_list = plancache_list;
  
  	/*
  	 * Pop the error context stack
--- 1723,1802 ----
  						   plan->nargs,
  						   plan->parserSetup,
  						   plan->parserSetupArg,
! 						   plan->cursor_options,
  						   false);		/* not fixed result */
  
  		plancache_list = lappend(plancache_list, plansource);
  	}
  
  	plan->plancache_list = plancache_list;
+ 	plan->oneshot = false;
+ 
+ 	/*
+ 	 * Pop the error context stack
+ 	 */
+ 	error_context_stack = spierrcontext.previous;
+ }
+ 
+ /*
+  * Parse, but don't analyze, a querystring.
+  *
+  * This is a stripped-down version of _SPI_prepare_plan that only does the
+  * initial raw parsing.  It creates "one shot" CachedPlanSources
+  * that still require parse analysis before execution is possible.
+  *
+  * The advantage of using the "one shot" form of CachedPlanSource is that
+  * we eliminate data copying and invalidation overhead.  Postponing parse
+  * analysis also prevents issues if some of the raw parsetrees are DDL
+  * commands that affect validity of later parsetrees.  Both of these
+  * attributes are good things for SPI_execute() and similar cases.
+  *
+  * Results are stored into *plan (specifically, plan->plancache_list).
+  * Note that the result data is all in CurrentMemoryContext or child contexts
+  * thereof; in practice this means it is in the SPI executor context, and
+  * what we are creating is a "temporary" SPIPlan.  Cruft generated during
+  * parsing is also left in CurrentMemoryContext.
+  */
+ static void
+ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
+ {
+ 	List	   *raw_parsetree_list;
+ 	List	   *plancache_list;
+ 	ListCell   *list_item;
+ 	ErrorContextCallback spierrcontext;
+ 
+ 	/*
+ 	 * Setup error traceback support for ereport()
+ 	 */
+ 	spierrcontext.callback = _SPI_error_callback;
+ 	spierrcontext.arg = (void *) src;
+ 	spierrcontext.previous = error_context_stack;
+ 	error_context_stack = &spierrcontext;
+ 
+ 	/*
+ 	 * Parse the request string into a list of raw parse trees.
+ 	 */
+ 	raw_parsetree_list = pg_parse_query(src);
+ 
+ 	/*
+ 	 * Construct plancache entries, but don't do parse analysis yet.
+ 	 */
+ 	plancache_list = NIL;
+ 
+ 	foreach(list_item, raw_parsetree_list)
+ 	{
+ 		Node	   *parsetree = (Node *) lfirst(list_item);
+ 		CachedPlanSource *plansource;
+ 
+ 		plansource = CreateOneShotCachedPlan(parsetree,
+ 											 src,
+ 											 CreateCommandTag(parsetree));
+ 
+ 		plancache_list = lappend(plancache_list, plansource);
+ 	}
+ 
+ 	plan->plancache_list = plancache_list;
+ 	plan->oneshot = true;
  
  	/*
  	 * Pop the error context stack
*************** _SPI_execute_plan(SPIPlanPtr plan, Param
*** 1770,1776 ****
  	 * Setup error traceback support for ereport()
  	 */
  	spierrcontext.callback = _SPI_error_callback;
! 	spierrcontext.arg = NULL;
  	spierrcontext.previous = error_context_stack;
  	error_context_stack = &spierrcontext;
  
--- 1834,1840 ----
  	 * Setup error traceback support for ereport()
  	 */
  	spierrcontext.callback = _SPI_error_callback;
! 	spierrcontext.arg = NULL;	/* we'll fill this below */
  	spierrcontext.previous = error_context_stack;
  	error_context_stack = &spierrcontext;
  
*************** _SPI_execute_plan(SPIPlanPtr plan, Param
*** 1817,1822 ****
--- 1881,1927 ----
  		spierrcontext.arg = (void *) plansource->query_string;
  
  		/*
+ 		 * If this is a one-shot plan, we still need to do parse analysis.
+ 		 */
+ 		if (plan->oneshot)
+ 		{
+ 			Node	   *parsetree = plansource->raw_parse_tree;
+ 			const char *src = plansource->query_string;
+ 			List	   *stmt_list;
+ 
+ 			/*
+ 			 * Parameter datatypes are driven by parserSetup hook if provided,
+ 			 * otherwise we use the fixed parameter list.
+ 			 */
+ 			if (plan->parserSetup != NULL)
+ 			{
+ 				Assert(plan->nargs == 0);
+ 				stmt_list = pg_analyze_and_rewrite_params(parsetree,
+ 														  src,
+ 														  plan->parserSetup,
+ 														  plan->parserSetupArg);
+ 			}
+ 			else
+ 			{
+ 				stmt_list = pg_analyze_and_rewrite(parsetree,
+ 												   src,
+ 												   plan->argtypes,
+ 												   plan->nargs);
+ 			}
+ 
+ 			/* Finish filling in the CachedPlanSource */
+ 			CompleteCachedPlan(plansource,
+ 							   stmt_list,
+ 							   NULL,
+ 							   plan->argtypes,
+ 							   plan->nargs,
+ 							   plan->parserSetup,
+ 							   plan->parserSetupArg,
+ 							   plan->cursor_options,
+ 							   false);		/* not fixed result */
+ 		}
+ 
+ 		/*
  		 * Replan if needed, and increment plan refcount.  If it's a saved
  		 * plan, the refcount must be backed by the CurrentResourceOwner.
  		 */
*************** _SPI_make_plan_non_temp(SPIPlanPtr plan)
*** 2313,2318 ****
--- 2418,2425 ----
  	/* Assert the input is a temporary SPIPlan */
  	Assert(plan->magic == _SPI_PLAN_MAGIC);
  	Assert(plan->plancxt == NULL);
+ 	/* One-shot plans can't be saved */
+ 	Assert(!plan->oneshot);
  
  	/*
  	 * Create a memory context for the plan, underneath the procedure context.
*************** _SPI_make_plan_non_temp(SPIPlanPtr plan)
*** 2330,2335 ****
--- 2437,2443 ----
  	newplan = (SPIPlanPtr) palloc(sizeof(_SPI_plan));
  	newplan->magic = _SPI_PLAN_MAGIC;
  	newplan->saved = false;
+ 	newplan->oneshot = false;
  	newplan->plancache_list = NIL;
  	newplan->plancxt = plancxt;
  	newplan->cursor_options = plan->cursor_options;
*************** _SPI_save_plan(SPIPlanPtr plan)
*** 2379,2384 ****
--- 2487,2495 ----
  	MemoryContext oldcxt;
  	ListCell   *lc;
  
+ 	/* One-shot plans can't be saved */
+ 	Assert(!plan->oneshot);
+ 
  	/*
  	 * Create a memory context for the plan.  We don't expect the plan to be
  	 * very large, so use smaller-than-default alloc parameters.  It's a
*************** _SPI_save_plan(SPIPlanPtr plan)
*** 2395,2400 ****
--- 2506,2512 ----
  	newplan = (SPIPlanPtr) palloc(sizeof(_SPI_plan));
  	newplan->magic = _SPI_PLAN_MAGIC;
  	newplan->saved = false;
+ 	newplan->oneshot = false;
  	newplan->plancache_list = NIL;
  	newplan->plancxt = plancxt;
  	newplan->cursor_options = plan->cursor_options;
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 8c0391ffcbdfa9058c345930c6d9568dbc9c526d..a211869866674ffbf06393d29a742cf5c69a7639 100644
*** a/src/backend/utils/cache/plancache.c
--- b/src/backend/utils/cache/plancache.c
*************** CreateCachedPlan(Node *raw_parse_tree,
*** 181,186 ****
--- 181,187 ----
  	plansource->invalItems = NIL;
  	plansource->query_context = NULL;
  	plansource->gplan = NULL;
+ 	plansource->is_oneshot = false;
  	plansource->is_complete = false;
  	plansource->is_saved = false;
  	plansource->is_valid = false;
*************** CreateCachedPlan(Node *raw_parse_tree,
*** 196,201 ****
--- 197,265 ----
  }
  
  /*
+  * CreateOneShotCachedPlan: initially create a one-shot plan cache entry.
+  *
+  * This variant of CreateCachedPlan creates a plan cache entry that is meant
+  * to be used only once.  No data copying occurs: all data structures remain
+  * in the caller's memory context (which typically should get cleared after
+  * completing execution).  The CachedPlanSource struct itself is also created
+  * in that context.
+  *
+  * A one-shot plan cannot be saved or copied, since we make no effort to
+  * preserve the raw parse tree unmodified.  There is also no support for
+  * invalidation, so plan use must be completed in the current transaction,
+  * and DDL that might invalidate the querytree_list must be avoided as well.
+  *
+  * raw_parse_tree: output of raw_parser()
+  * query_string: original query text
+  * commandTag: compile-time-constant tag for query, or NULL if empty query
+  */
+ CachedPlanSource *
+ CreateOneShotCachedPlan(Node *raw_parse_tree,
+ 						const char *query_string,
+ 						const char *commandTag)
+ {
+ 	CachedPlanSource *plansource;
+ 
+ 	Assert(query_string != NULL);		/* required as of 8.4 */
+ 
+ 	/*
+ 	 * Create and fill the CachedPlanSource struct within the caller's memory
+ 	 * context.  Most fields are just left empty for the moment.
+ 	 */
+ 	plansource = (CachedPlanSource *) palloc0(sizeof(CachedPlanSource));
+ 	plansource->magic = CACHEDPLANSOURCE_MAGIC;
+ 	plansource->raw_parse_tree = raw_parse_tree;
+ 	plansource->query_string = query_string;
+ 	plansource->commandTag = commandTag;
+ 	plansource->param_types = NULL;
+ 	plansource->num_params = 0;
+ 	plansource->parserSetup = NULL;
+ 	plansource->parserSetupArg = NULL;
+ 	plansource->cursor_options = 0;
+ 	plansource->fixed_result = false;
+ 	plansource->resultDesc = NULL;
+ 	plansource->search_path = NULL;
+ 	plansource->context = CurrentMemoryContext;
+ 	plansource->query_list = NIL;
+ 	plansource->relationOids = NIL;
+ 	plansource->invalItems = NIL;
+ 	plansource->query_context = NULL;
+ 	plansource->gplan = NULL;
+ 	plansource->is_oneshot = true;
+ 	plansource->is_complete = false;
+ 	plansource->is_saved = false;
+ 	plansource->is_valid = false;
+ 	plansource->generation = 0;
+ 	plansource->next_saved = NULL;
+ 	plansource->generic_cost = -1;
+ 	plansource->total_custom_cost = 0;
+ 	plansource->num_custom_plans = 0;
+ 
+ 	return plansource;
+ }
+ 
+ /*
   * CompleteCachedPlan: second step of creating a plan cache entry.
   *
   * Pass in the analyzed-and-rewritten form of the query, as well as the
*************** CreateCachedPlan(Node *raw_parse_tree,
*** 222,227 ****
--- 286,295 ----
   * option, it is caller's responsibility that the referenced data remains
   * valid for as long as the CachedPlanSource exists.
   *
+  * If the CachedPlanSource is a "oneshot" plan, then no querytree copying
+  * occurs at all, and querytree_context is ignored; it is caller's
+  * responsibility that the passed querytree_list is sufficiently long-lived.
+  *
   * plansource: structure returned by CreateCachedPlan
   * querytree_list: analyzed-and-rewritten form of query (list of Query nodes)
   * querytree_context: memory context containing querytree_list,
*************** CompleteCachedPlan(CachedPlanSource *pla
*** 254,262 ****
  	/*
  	 * If caller supplied a querytree_context, reparent it underneath the
  	 * CachedPlanSource's context; otherwise, create a suitable context and
! 	 * copy the querytree_list into it.
  	 */
! 	if (querytree_context != NULL)
  	{
  		MemoryContextSetParent(querytree_context, source_context);
  		MemoryContextSwitchTo(querytree_context);
--- 322,336 ----
  	/*
  	 * If caller supplied a querytree_context, reparent it underneath the
  	 * CachedPlanSource's context; otherwise, create a suitable context and
! 	 * copy the querytree_list into it.  But no data copying should be done
! 	 * for one-shot plans; for those, assume the passed querytree_list is
! 	 * sufficiently long-lived.
  	 */
! 	if (plansource->is_oneshot)
! 	{
! 		querytree_context = CurrentMemoryContext;
! 	}
! 	else if (querytree_context != NULL)
  	{
  		MemoryContextSetParent(querytree_context, source_context);
  		MemoryContextSwitchTo(querytree_context);
*************** CompleteCachedPlan(CachedPlanSource *pla
*** 279,289 ****
  	/*
  	 * Use the planner machinery to extract dependencies.  Data is saved in
  	 * query_context.  (We assume that not a lot of extra cruft is created by
! 	 * this call.)
  	 */
! 	extract_query_dependencies((Node *) querytree_list,
! 							   &plansource->relationOids,
! 							   &plansource->invalItems);
  
  	/*
  	 * Save the final parameter types (or other parameter specification data)
--- 353,364 ----
  	/*
  	 * Use the planner machinery to extract dependencies.  Data is saved in
  	 * query_context.  (We assume that not a lot of extra cruft is created by
! 	 * this call.)  We can skip this for one-shot plans.
  	 */
! 	if (!plansource->is_oneshot)
! 		extract_query_dependencies((Node *) querytree_list,
! 								   &plansource->relationOids,
! 								   &plansource->invalItems);
  
  	/*
  	 * Save the final parameter types (or other parameter specification data)
*************** CompleteCachedPlan(CachedPlanSource *pla
*** 326,332 ****
   * it to the list of cached plans that are checked for invalidation when an
   * sinval event occurs.
   *
!  * This is guaranteed not to throw error; callers typically depend on that
   * since this is called just before or just after adding a pointer to the
   * CachedPlanSource to some permanent data structure of their own.	Up until
   * this is done, a CachedPlanSource is just transient data that will go away
--- 401,408 ----
   * it to the list of cached plans that are checked for invalidation when an
   * sinval event occurs.
   *
!  * This is guaranteed not to throw error, except for the caller-error case
!  * of trying to save a one-shot plan.  Callers typically depend on that
   * since this is called just before or just after adding a pointer to the
   * CachedPlanSource to some permanent data structure of their own.	Up until
   * this is done, a CachedPlanSource is just transient data that will go away
*************** SaveCachedPlan(CachedPlanSource *plansou
*** 340,345 ****
--- 416,425 ----
  	Assert(plansource->is_complete);
  	Assert(!plansource->is_saved);
  
+ 	/* This seems worth a real test, though */
+ 	if (plansource->is_oneshot)
+ 		elog(ERROR, "cannot save one-shot cached plan");
+ 
  	/*
  	 * In typical use, this function would be called before generating any
  	 * plans from the CachedPlanSource.  If there is a generic plan, moving it
*************** DropCachedPlan(CachedPlanSource *plansou
*** 402,412 ****
  	/* Decrement generic CachePlan's refcount and drop if no longer needed */
  	ReleaseGenericPlan(plansource);
  
  	/*
  	 * Remove the CachedPlanSource and all subsidiary data (including the
! 	 * query_context if any).
  	 */
! 	MemoryContextDelete(plansource->context);
  }
  
  /*
--- 482,496 ----
  	/* Decrement generic CachePlan's refcount and drop if no longer needed */
  	ReleaseGenericPlan(plansource);
  
+ 	/* Mark it no longer valid */
+ 	plansource->magic = 0;
+ 
  	/*
  	 * Remove the CachedPlanSource and all subsidiary data (including the
! 	 * query_context if any).  But if it's a one-shot we can't free anything.
  	 */
! 	if (!plansource->is_oneshot)
! 		MemoryContextDelete(plansource->context);
  }
  
  /*
*************** RevalidateCachedQuery(CachedPlanSource *
*** 452,457 ****
--- 536,552 ----
  	MemoryContext oldcxt;
  
  	/*
+ 	 * For one-shot plans, we do not support revalidation checking; it's
+ 	 * assumed the query is parsed, planned, and executed in one transaction,
+ 	 * so that no lock re-acquisition is necessary.
+ 	 */
+ 	if (plansource->is_oneshot)
+ 	{
+ 		Assert(plansource->is_valid);
+ 		return NIL;
+ 	}
+ 
+ 	/*
  	 * If the query is currently valid, acquire locks on the referenced
  	 * objects; then check again.  We need to do it this way to cover the race
  	 * condition that an invalidation message arrives before we get the locks.
*************** CheckCachedPlan(CachedPlanSource *planso
*** 649,654 ****
--- 744,751 ----
  		return false;
  
  	Assert(plan->magic == CACHEDPLAN_MAGIC);
+ 	/* Generic plans are never one-shot */
+ 	Assert(!plan->is_oneshot);
  
  	/*
  	 * If it appears valid, acquire locks and recheck; this is much the same
*************** CheckCachedPlan(CachedPlanSource *planso
*** 708,714 ****
   * hint rather than a hard constant.
   *
   * Planning work is done in the caller's memory context.  The finished plan
!  * is in a child memory context, which typically should get reparented.
   */
  static CachedPlan *
  BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
--- 805,812 ----
   * hint rather than a hard constant.
   *
   * Planning work is done in the caller's memory context.  The finished plan
!  * is in a child memory context, which typically should get reparented
!  * (unless this is a one-shot plan, in which case we don't copy the plan).
   */
  static CachedPlan *
  BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
*************** BuildCachedPlan(CachedPlanSource *planso
*** 719,725 ****
  	bool		snapshot_set;
  	bool		spi_pushed;
  	MemoryContext plan_context;
! 	MemoryContext oldcxt;
  
  	/*
  	 * Normally the querytree should be valid already, but if it's not,
--- 817,823 ----
  	bool		snapshot_set;
  	bool		spi_pushed;
  	MemoryContext plan_context;
! 	MemoryContext oldcxt = CurrentMemoryContext;
  
  	/*
  	 * Normally the querytree should be valid already, but if it's not,
*************** BuildCachedPlan(CachedPlanSource *planso
*** 739,748 ****
  
  	/*
  	 * If we don't already have a copy of the querytree list that can be
! 	 * scribbled on by the planner, make one.
  	 */
  	if (qlist == NIL)
! 		qlist = (List *) copyObject(plansource->query_list);
  
  	/*
  	 * Restore the search_path that was in use when the plan was made. See
--- 837,852 ----
  
  	/*
  	 * If we don't already have a copy of the querytree list that can be
! 	 * scribbled on by the planner, make one.  For a one-shot plan, we assume
! 	 * it's okay to scribble on the original query_list.
  	 */
  	if (qlist == NIL)
! 	{
! 		if (!plansource->is_oneshot)
! 			qlist = (List *) copyObject(plansource->query_list);
! 		else
! 			qlist = plansource->query_list;
! 	}
  
  	/*
  	 * Restore the search_path that was in use when the plan was made. See
*************** BuildCachedPlan(CachedPlanSource *planso
*** 794,815 ****
  	PopOverrideSearchPath();
  
  	/*
! 	 * Make a dedicated memory context for the CachedPlan and its subsidiary
! 	 * data.  It's probably not going to be large, but just in case, use the
! 	 * default maxsize parameter.  It's transient for the moment.
  	 */
! 	plan_context = AllocSetContextCreate(CurrentMemoryContext,
! 										 "CachedPlan",
! 										 ALLOCSET_SMALL_MINSIZE,
! 										 ALLOCSET_SMALL_INITSIZE,
! 										 ALLOCSET_DEFAULT_MAXSIZE);
  
! 	/*
! 	 * Copy plan into the new context.
! 	 */
! 	oldcxt = MemoryContextSwitchTo(plan_context);
  
! 	plist = (List *) copyObject(plist);
  
  	/*
  	 * Create and fill the CachedPlan struct within the new context.
--- 898,926 ----
  	PopOverrideSearchPath();
  
  	/*
! 	 * Normally we make a dedicated memory context for the CachedPlan and its
! 	 * subsidiary data.  (It's probably not going to be large, but just in
! 	 * case, use the default maxsize parameter.  It's transient for the
! 	 * moment.)  But for a one-shot plan, we just leave it in the caller's
! 	 * memory context.
  	 */
! 	if (!plansource->is_oneshot)
! 	{
! 		plan_context = AllocSetContextCreate(CurrentMemoryContext,
! 											 "CachedPlan",
! 											 ALLOCSET_SMALL_MINSIZE,
! 											 ALLOCSET_SMALL_INITSIZE,
! 											 ALLOCSET_DEFAULT_MAXSIZE);
  
! 		/*
! 		 * Copy plan into the new context.
! 		 */
! 		MemoryContextSwitchTo(plan_context);
  
! 		plist = (List *) copyObject(plist);
! 	}
! 	else
! 		plan_context = CurrentMemoryContext;
  
  	/*
  	 * Create and fill the CachedPlan struct within the new context.
*************** BuildCachedPlan(CachedPlanSource *planso
*** 826,831 ****
--- 937,943 ----
  		plan->saved_xmin = InvalidTransactionId;
  	plan->refcount = 0;
  	plan->context = plan_context;
+ 	plan->is_oneshot = plansource->is_oneshot;
  	plan->is_saved = false;
  	plan->is_valid = true;
  
*************** choose_custom_plan(CachedPlanSource *pla
*** 847,853 ****
  {
  	double		avg_custom_cost;
  
! 	/* Never any point in a custom plan if there's no parameters */
  	if (boundParams == NULL)
  		return false;
  
--- 959,969 ----
  {
  	double		avg_custom_cost;
  
! 	/* One-shot plans will always be considered custom */
! 	if (plansource->is_oneshot)
! 		return true;
! 
! 	/* Otherwise, never any point in a custom plan if there's no parameters */
  	if (boundParams == NULL)
  		return false;
  
*************** ReleaseCachedPlan(CachedPlan *plan, bool
*** 1049,1055 ****
  	Assert(plan->refcount > 0);
  	plan->refcount--;
  	if (plan->refcount == 0)
! 		MemoryContextDelete(plan->context);
  }
  
  /*
--- 1165,1178 ----
  	Assert(plan->refcount > 0);
  	plan->refcount--;
  	if (plan->refcount == 0)
! 	{
! 		/* Mark it no longer valid */
! 		plan->magic = 0;
! 
! 		/* One-shot plans do not own their context, so we can't free them */
! 		if (!plan->is_oneshot)
! 			MemoryContextDelete(plan->context);
! 	}
  }
  
  /*
*************** CachedPlanSetParentContext(CachedPlanSou
*** 1066,1074 ****
  	Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
  	Assert(plansource->is_complete);
  
! 	/* This seems worth a real test, though */
  	if (plansource->is_saved)
  		elog(ERROR, "cannot move a saved cached plan to another context");
  
  	/* OK, let the caller keep the plan where he wishes */
  	MemoryContextSetParent(plansource->context, newcontext);
--- 1189,1199 ----
  	Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
  	Assert(plansource->is_complete);
  
! 	/* These seem worth real tests, though */
  	if (plansource->is_saved)
  		elog(ERROR, "cannot move a saved cached plan to another context");
+ 	if (plansource->is_oneshot)
+ 		elog(ERROR, "cannot move a one-shot cached plan to another context");
  
  	/* OK, let the caller keep the plan where he wishes */
  	MemoryContextSetParent(plansource->context, newcontext);
*************** CopyCachedPlan(CachedPlanSource *plansou
*** 1105,1110 ****
--- 1230,1242 ----
  	Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
  	Assert(plansource->is_complete);
  
+ 	/*
+ 	 * One-shot plans can't be copied, because we haven't taken care that
+ 	 * parsing/planning didn't scribble on the raw parse tree or querytrees.
+ 	 */
+ 	if (plansource->is_oneshot)
+ 		elog(ERROR, "cannot copy a one-shot cached plan");
+ 
  	source_context = AllocSetContextCreate(CurrentMemoryContext,
  										   "CachedPlanSource",
  										   ALLOCSET_SMALL_MINSIZE,
*************** CopyCachedPlan(CachedPlanSource *plansou
*** 1152,1157 ****
--- 1284,1290 ----
  
  	newsource->gplan = NULL;
  
+ 	newsource->is_oneshot = false;
  	newsource->is_complete = true;
  	newsource->is_saved = false;
  	newsource->is_valid = plansource->is_valid;
diff --git a/src/include/executor/spi_priv.h b/src/include/executor/spi_priv.h
index 4fbb548af4b93d360b6cd48f8f03993bd383594f..1a828912fcddaa89d02a5faf5b837a6d037f679c 100644
*** a/src/include/executor/spi_priv.h
--- b/src/include/executor/spi_priv.h
*************** typedef struct
*** 59,64 ****
--- 59,70 ----
   * while additional data such as argtypes and list cells is loose in the SPI
   * executor context.  Such plans can be identified by having plancxt == NULL.
   *
+  * We can also have "one-shot" SPI plans (which are typically temporary,
+  * as described above).  These are meant to be executed once and discarded,
+  * and various optimizations are made on the assumption of single use.
+  * Note in particular that the CachedPlanSources within such an SPI plan
+  * are not "complete" until execution.
+  *
   * Note: if the original query string contained only whitespace and comments,
   * the plancache_list will be NIL and so there is no place to store the
   * query string.  We don't care about that, but we do care about the
*************** typedef struct _SPI_plan
*** 68,73 ****
--- 74,80 ----
  {
  	int			magic;			/* should equal _SPI_PLAN_MAGIC */
  	bool		saved;			/* saved or unsaved plan? */
+ 	bool		oneshot;		/* one-shot plan? */
  	List	   *plancache_list; /* one CachedPlanSource per parsetree */
  	MemoryContext plancxt;		/* Context containing _SPI_plan and data */
  	int			cursor_options; /* Cursor options used for planning */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 413e8462a6c6e50241574d6ebcb2f38837b0af25..7d0fd6e4d2a6c6695b79e3a9efcbc7ee9d749122 100644
*** a/src/include/utils/plancache.h
--- b/src/include/utils/plancache.h
***************
*** 60,65 ****
--- 60,73 ----
   * context that holds the rewritten query tree and associated data.  This
   * allows the query tree to be discarded easily when it is invalidated.
   *
+  * Some callers wish to use the CachedPlan API even with one-shot queries
+  * that have no reason to be saved at all.  We therefore support a "oneshot"
+  * variant that does no data copying or invalidation checking.  In this case
+  * there are no separate memory contexts: the CachedPlanSource struct and
+  * all subsidiary data live in the caller's CurrentMemoryContext, and there
+  * is no way to free memory short of clearing that entire context.  A oneshot
+  * plan is always treated as unsaved.
+  *
   * Note: the string referenced by commandTag is not subsidiary storage;
   * it is assumed to be a compile-time-constant string.	As with portals,
   * commandTag shall be NULL if and only if the original query string (before
*************** typedef struct CachedPlanSource
*** 69,75 ****
  {
  	int			magic;			/* should equal CACHEDPLANSOURCE_MAGIC */
  	Node	   *raw_parse_tree; /* output of raw_parser() */
! 	char	   *query_string;	/* source text of query */
  	const char *commandTag;		/* command tag (a constant!), or NULL */
  	Oid		   *param_types;	/* array of parameter type OIDs, or NULL */
  	int			num_params;		/* length of param_types array */
--- 77,83 ----
  {
  	int			magic;			/* should equal CACHEDPLANSOURCE_MAGIC */
  	Node	   *raw_parse_tree; /* output of raw_parser() */
! 	const char *query_string;	/* source text of query */
  	const char *commandTag;		/* command tag (a constant!), or NULL */
  	Oid		   *param_types;	/* array of parameter type OIDs, or NULL */
  	int			num_params;		/* length of param_types array */
*************** typedef struct CachedPlanSource
*** 88,93 ****
--- 96,102 ----
  	/* If we have a generic plan, this is a reference-counted link to it: */
  	struct CachedPlan *gplan;	/* generic plan, or NULL if not valid */
  	/* Some state flags: */
+ 	bool		is_oneshot;		/* is it a "oneshot" plan? */
  	bool		is_complete;	/* has CompleteCachedPlan been done? */
  	bool		is_saved;		/* has CachedPlanSource been "saved"? */
  	bool		is_valid;		/* is the query_list currently valid? */
*************** typedef struct CachedPlanSource
*** 106,118 ****
   * (if any), and any active plan executions, so the plan can be discarded
   * exactly when refcount goes to zero.	Both the struct itself and the
   * subsidiary data live in the context denoted by the context field.
!  * This makes it easy to free a no-longer-needed cached plan.
   */
  typedef struct CachedPlan
  {
  	int			magic;			/* should equal CACHEDPLAN_MAGIC */
  	List	   *stmt_list;		/* list of statement nodes (PlannedStmts and
  								 * bare utility statements) */
  	bool		is_saved;		/* is CachedPlan in a long-lived context? */
  	bool		is_valid;		/* is the stmt_list currently valid? */
  	TransactionId saved_xmin;	/* if valid, replan when TransactionXmin
--- 115,130 ----
   * (if any), and any active plan executions, so the plan can be discarded
   * exactly when refcount goes to zero.	Both the struct itself and the
   * subsidiary data live in the context denoted by the context field.
!  * This makes it easy to free a no-longer-needed cached plan.  (However,
!  * if is_oneshot is true, the context does not belong solely to the CachedPlan
!  * so no freeing is possible.)
   */
  typedef struct CachedPlan
  {
  	int			magic;			/* should equal CACHEDPLAN_MAGIC */
  	List	   *stmt_list;		/* list of statement nodes (PlannedStmts and
  								 * bare utility statements) */
+ 	bool		is_oneshot;		/* is it a "oneshot" plan? */
  	bool		is_saved;		/* is CachedPlan in a long-lived context? */
  	bool		is_valid;		/* is the stmt_list currently valid? */
  	TransactionId saved_xmin;	/* if valid, replan when TransactionXmin
*************** extern void ResetPlanCache(void);
*** 129,134 ****
--- 141,149 ----
  extern CachedPlanSource *CreateCachedPlan(Node *raw_parse_tree,
  				 const char *query_string,
  				 const char *commandTag);
+ extern CachedPlanSource *CreateOneShotCachedPlan(Node *raw_parse_tree,
+ 				 const char *query_string,
+ 				 const char *commandTag);
  extern void CompleteCachedPlan(CachedPlanSource *plansource,
  				   List *querytree_list,
  				   MemoryContext querytree_context,
#8Jeff Janes
jeff.janes@gmail.com
In reply to: Heikki Linnakangas (#3)
Re: dynamic SQL - possible performance regression in 9.2

On Friday, December 28, 2012, Heikki Linnakangas wrote:

On 28.12.2012 23:53, Peter Eisentraut wrote:

On 12/27/12 1:07 AM, Pavel Stehule wrote:

Hello

I rechecked performance of dynamic SQL and it is significantly slower
in 9.2 than 9.1

-- 9.1
postgres=# create or replace function test() returns void as $$ begin
for i in 1..1000000 loop execute 'select 1'; end loop; end $$ language
plpgsql;

I think this is the same as the case discussed at
<CAD4+=qWnGU0qi+iq=EPh6EGPuUnSCYsGDTgKazizEvrGgjo0Sg@mail.gmail.com>.

Yeah, probably so.

As it happens, I just spent a lot of time today narrowing down yet another
report of a regression in 9.2, when running DBT-2:
http://archives.postgresql.**org/pgsql-performance/2012-11/**msg00007.php&lt;http://archives.postgresql.org/pgsql-performance/2012-11/msg00007.php&gt;.
It looks like that is also caused by the plancache changes. DBT-2
implements the transactions using C functions, which use SPI_execute() to
run all the queries.

It looks like the regression is caused by extra copying of the parse tree
and plan trees. Node-copy-related functions like AllocSetAlloc and _copy*
are high in the profile, They are also high in the 9.1 profile, but even
more so in 9.2.

I hacked together a quick&dirty patch to reduce the copying of single-shot
plans, and was able to buy back much of the regression I was seeing on
DBT-2. Patch attached.

The plancache change slowed down a dynamic sql partitioning trigger about
26%, and your patch redeems about 1/2 of that cost.

Using a RULE-based partitioning instead with row by row insertion, the
plancache changes slowed it down by 300%, and this patch doesn't change
that. But that seems to be down to the insertion getting planned
repeatedly, because it decides the custom plan is cheaper than the generic
plan. Whatever savings the custom plan may have are clearly less than the
cost of doing the planning repeatedly.

Cheers,

Jeff

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#8)
Re: dynamic SQL - possible performance regression in 9.2

Jeff Janes <jeff.janes@gmail.com> writes:

Using a RULE-based partitioning instead with row by row insertion, the
plancache changes slowed it down by 300%, and this patch doesn't change
that. But that seems to be down to the insertion getting planned
repeatedly, because it decides the custom plan is cheaper than the generic
plan. Whatever savings the custom plan may have are clearly less than the
cost of doing the planning repeatedly.

That scenario doesn't sound like it has anything to do with the one being
discussed in this thread. But what do you mean by "rule-based
partitioning" exactly? A rule per se wouldn't result in a cached plan
at all, let alone one with parameters, which would be necessary to
trigger any use of the custom-cached-plan code path.

Test cases are way more interesting than hand-wavy complaints.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#7)
Re: dynamic SQL - possible performance regression in 9.2

On 1/1/13 6:48 PM, Tom Lane wrote:

I wrote:

I'm inclined to think that Heikki's patch doesn't go far enough, if we
want to optimize behavior in this case. What we really want to happen
is that parsing, planning, and execution all happen in the caller's
memory context, with no copying of parse or plan trees at all - and we
could do without overhead such as dependency extraction and invalidation
checking, too. This would make SPI_execute a lot more comparable to the
behavior of exec_simple_query().

Here's a draft patch for that.

This didn't make a difference in my test case. I might have to do some
bisecting to find where the problem was introduced.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#10)
Re: dynamic SQL - possible performance regression in 9.2

Peter Eisentraut <peter_e@gmx.net> writes:

On 1/1/13 6:48 PM, Tom Lane wrote:

Here's a draft patch for that.

This didn't make a difference in my test case. I might have to do some
bisecting to find where the problem was introduced.

Could we see the test case? Or at least oprofile results for it?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Dong Ye
yed@vmware.com
In reply to: Tom Lane (#7)
1 attachment(s)
Re: dynamic SQL - possible performance regression in 9.2

I did three back-to-back runs using the same settings as in
http://archives.postgresql.org/pgsql-performance/2012-11/msg00007.php
Except:
- use no prepared statement
- use 40 db connections
- build source from postgresql.git on the server box using: REL9_1_7,
REL9_2_2, REL9_2_2 + this patch

NOTPM results:
REL9_1_7: 46512.66
REL9_2_2: 42828.66
REL9_2_2 + this patch: 46973.70

Thanks,
Dong

PS, the top 20 lines of oprofile of these runs attached.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, January 01, 2013 6:48 PM
To: Peter Eisentraut
Cc: Heikki Linnakangas; Pavel Stehule; PostgreSQL Hackers; Dong Ye
Subject: Re: [HACKERS] dynamic SQL - possible performance regression in
9.2

I wrote:

I'm inclined to think that Heikki's patch doesn't go far enough, if we
want to optimize behavior in this case. What we really want to happen
is that parsing, planning, and execution all happen in the caller's
memory context, with no copying of parse or plan trees at all - and we
could do without overhead such as dependency extraction and invalidation
checking, too. This would make SPI_execute a lot more comparable to the
behavior of exec_simple_query().

Here's a draft patch for that. My initial hack at it had a
disadvantage, which was that because no invalidation checking happened,
a SPI_execute query string containing a DDL command (such as ALTER TABLE)
followed by a command affected by the DDL would fail to reparse/replan
the second command properly. (I suspect that Heikki's version had a
related defect, but haven't looked closely.) Now that's not a huge deal
IMO, because in many common cases parse analysis of the second command
would fail anyway. For instance, this has never worked in any PG
release:

do $$ begin execute 'create table foo(f1 int); insert into foo
values(1);'; end $$;

However it troubled me that there might be some regression there, and
after a bit of reflection I decided the right fix would be to rearrange
the code in spi.c so that parse analysis of later parsetrees follows
execution of earlier ones. This makes the behavior of SPI_execute()
even more like that of exec_simple_query(), and shouldn't cost anything
noticeable given the other changes here.

I'm not entirely sure about performance of this fix, though. I got
numbers varying between roughly-on-par with 9.1 and 10% slower than 9.1
for Pavel's example, depending on seemingly not-performance-related
rearrangements of the code in spi.c. I think this must be chance
effects of cache line alignment, but it would be good to hear what other
people get, both on Pavel's example and the other ones alluded to.
In any case this seems better than unmodified HEAD, which was 40% slower
than 9.1 for me.

regards, tom lane

Attachments:

opreports.txttext/plain; name=opreports.txtDownload
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dong Ye (#12)
Re: dynamic SQL - possible performance regression in 9.2

"Dong Ye" <yed@vmware.com> writes:

I did three back-to-back runs using the same settings as in
http://archives.postgresql.org/pgsql-performance/2012-11/msg00007.php
Except:
- use no prepared statement
- use 40 db connections
- build source from postgresql.git on the server box using: REL9_1_7,
REL9_2_2, REL9_2_2 + this patch

NOTPM results:
REL9_1_7: 46512.66
REL9_2_2: 42828.66
REL9_2_2 + this patch: 46973.70

Thanks! I think this is probably sufficient evidence to conclude that
we should apply this patch, at least in HEAD. Whatever Peter is seeing
must be some other issue, which we can address whenever we understand
what it is.

Next question is what people think about back-patching into 9.2 so as
to eliminate the performance regression vs 9.1. I believe this would
be safe (although some care would have to be taken to put the added
boolean fields into places where they'd not result in an ABI break).
However it may not be worth the risk. The 40% slowdown seen with
Pavel's example seems to me to be an extreme corner case --- Dong's
result of 8% slowdown is probably more realistic for normal uses
of SPI_execute. Might be better to just live with it in 9.2.
Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#13)
Re: dynamic SQL - possible performance regression in 9.2

2013/1/4 Tom Lane <tgl@sss.pgh.pa.us>:

"Dong Ye" <yed@vmware.com> writes:

I did three back-to-back runs using the same settings as in
http://archives.postgresql.org/pgsql-performance/2012-11/msg00007.php
Except:
- use no prepared statement
- use 40 db connections
- build source from postgresql.git on the server box using: REL9_1_7,
REL9_2_2, REL9_2_2 + this patch

NOTPM results:
REL9_1_7: 46512.66
REL9_2_2: 42828.66
REL9_2_2 + this patch: 46973.70

Thanks! I think this is probably sufficient evidence to conclude that
we should apply this patch, at least in HEAD. Whatever Peter is seeing
must be some other issue, which we can address whenever we understand
what it is.

Next question is what people think about back-patching into 9.2 so as
to eliminate the performance regression vs 9.1. I believe this would
be safe (although some care would have to be taken to put the added
boolean fields into places where they'd not result in an ABI break).
However it may not be worth the risk. The 40% slowdown seen with
Pavel's example seems to me to be an extreme corner case --- Dong's
result of 8% slowdown is probably more realistic for normal uses
of SPI_execute. Might be better to just live with it in 9.2.
Thoughts?

I am for back-patching - I agree with you so my example is corner
case, and cannot be worse example - but performance regression about
5-10% can be confusing for users - because they can searching
regression in their application.

Regards

Pavel Stehule

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#13)
Re: dynamic SQL - possible performance regression in 9.2

Next question is what people think about back-patching into 9.2 so as
to eliminate the performance regression vs 9.1. I believe this would
be safe (although some care would have to be taken to put the added
boolean fields into places where they'd not result in an ABI break).
However it may not be worth the risk. The 40% slowdown seen with
Pavel's example seems to me to be an extreme corner case --- Dong's
result of 8% slowdown is probably more realistic for normal uses
of SPI_execute. Might be better to just live with it in 9.2.
Thoughts?

8% is a pretty serious regression, for those of us with applications
which do a lot of dynamic SQL. As a reminder, many people do dynamic
SQL even in repetitive, performance-sensitive functions in order to
avoid plan caching. Also partition-handlers often use dynamic SQL, and
a 10% drop in loading rows/second would be a big deal.

Let's put it this way: if the community doesn't backport it, we'll end
up doing so ad-hoc for some of our customers.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Josh Berkus (#15)
Re: dynamic SQL - possible performance regression in 9.2

On 04.01.2013 22:05, Josh Berkus wrote:

Next question is what people think about back-patching into 9.2 so as
to eliminate the performance regression vs 9.1. I believe this would
be safe (although some care would have to be taken to put the added
boolean fields into places where they'd not result in an ABI break).
However it may not be worth the risk. The 40% slowdown seen with
Pavel's example seems to me to be an extreme corner case --- Dong's
result of 8% slowdown is probably more realistic for normal uses
of SPI_execute. Might be better to just live with it in 9.2.
Thoughts?

8% is a pretty serious regression, for those of us with applications
which do a lot of dynamic SQL. As a reminder, many people do dynamic
SQL even in repetitive, performance-sensitive functions in order to
avoid plan caching. Also partition-handlers often use dynamic SQL, and
a 10% drop in loading rows/second would be a big deal.

+1 for backpatching.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Joshua D. Drake
jd@commandprompt.com
In reply to: Josh Berkus (#15)
Re: dynamic SQL - possible performance regression in 9.2

On 01/04/2013 12:05 PM, Josh Berkus wrote:

Next question is what people think about back-patching into 9.2 so as
to eliminate the performance regression vs 9.1. I believe this would
be safe (although some care would have to be taken to put the added
boolean fields into places where they'd not result in an ABI break).
However it may not be worth the risk. The 40% slowdown seen with
Pavel's example seems to me to be an extreme corner case --- Dong's
result of 8% slowdown is probably more realistic for normal uses
of SPI_execute. Might be better to just live with it in 9.2.
Thoughts?

8% is a pretty serious regression, for those of us with applications
which do a lot of dynamic SQL. As a reminder, many people do dynamic
SQL even in repetitive, performance-sensitive functions in order to
avoid plan caching. Also partition-handlers often use dynamic SQL, and
a 10% drop in loading rows/second would be a big deal.

Let's put it this way: if the community doesn't backport it, we'll end
up doing so ad-hoc for some of our customers.

Exactly. This is a significant reduction in the production quality of
PostgreSQL as it pertains to dynamic SQL. To put it more bluntly, we
will have people not upgrade to 9.2 specifically because of this problem.

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#15)
Re: dynamic SQL - possible performance regression in 9.2

Josh Berkus <josh@agliodbs.com> writes:

Next question is what people think about back-patching into 9.2 so as
to eliminate the performance regression vs 9.1.

8% is a pretty serious regression, for those of us with applications
which do a lot of dynamic SQL. As a reminder, many people do dynamic
SQL even in repetitive, performance-sensitive functions in order to
avoid plan caching.

Well, of course, people with that type of problem should probably
rethink their use of dynamic SQL when they move to 9.2 anyway, because
that's the case where the new plancache code could actually help them
if they'd only let it.

But anyway, nobody seems to be speaking against back-patching, so
I'll go do it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#18)
Re: dynamic SQL - possible performance regression in 9.2

On 01/04/2013 01:17 PM, Tom Lane wrote:

Well, of course, people with that type of problem should probably
rethink their use of dynamic SQL when they move to 9.2 anyway, because
that's the case where the new plancache code could actually help them
if they'd only let it.

Not to further the argument because of the +1 from you but I think it is
important to keep in mind that the less work we make it to upgrade, the
more likely it is they will.

It took forever (and we still have stragglers) to get people past 8.2
because of the casting change in 8.3. This is a similar problem in that
if we want them to upgrade to take advantage of features, we have to
make it so the least amount of work possible is needed to make that
upgrade happen.

Rewriting many thousands of lines of dynamic sql to upgrade to 9.2 is
certainly not doing that :).

Sincerely,

Joshua D. Drake

But anyway, nobody seems to be speaking against back-patching, so
I'll go do it.

regards, tom lane

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#18)
Re: dynamic SQL - possible performance regression in 9.2

Well, of course, people with that type of problem should probably
rethink their use of dynamic SQL when they move to 9.2 anyway, because
that's the case where the new plancache code could actually help them
if they'd only let it.

Oh, no question. But it'll take time for users with 1000's of lines of
PLpgSQL from 8.2 to rewrite it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Jeff Janes
jeff.janes@gmail.com
In reply to: Josh Berkus (#20)
2 attachment(s)

On Wednesday, January 2, 2013, Tom Lane wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

Using a RULE-based partitioning instead with row by row insertion, the
plancache changes slowed it down by 300%, and this patch doesn't change
that. But that seems to be down to the insertion getting planned
repeatedly, because it decides the custom plan is cheaper than the

generic

plan. Whatever savings the custom plan may have are clearly less than

the

cost of doing the planning repeatedly.

That scenario doesn't sound like it has anything to do with the one being
discussed in this thread. But what do you mean by "rule-based
partitioning" exactly? A rule per se wouldn't result in a cached plan
at all, let alone one with parameters, which would be necessary to
trigger any use of the custom-cached-plan code path.

Right, it is not related to the dynamic SQL, but is to the plan-cache.

Test cases are way more interesting than hand-wavy complaints.

Sorry, when exiled to the hinterlands I have more time to test various
things but not a good enough connectivity to describe them well. I'm
attaching the test case to load 1e5 rows into a very skinny table with 100
partitions using rules.

"origin" is from a few days ago, "origin_reduce_copies" is Heikki's patch,
and "origin_one_shot" is your now-committed patch. (unshown are
e6faf910d75027 and e6faf910d75027_prev, but that is where the regression
was introduced)

JJ /usr/local/pgsql_REL9_1_7/
Time: 64252.6907920837 ms
JJ origin/
Time: 186657.824039459 ms
JJ origin_reduce_copies/
Time: 185370.236873627 ms
JJ origin_one_shot/
Time: 189104.484081268 ms

The root problem is that it thinks the generic plan costs about 50% more
than the custom one. I don't know why it thinks that, or how much it is
worth chasing it down.

On the other hand, your patch does fix almost all of the 9.2.[012]
regression of using the following dynamic SQL trigger (instead of RULES) to
load into the same test case.

CREATE OR REPLACE FUNCTION foo_insert_trigger()
RETURNS trigger AS $$
DECLARE tablename varchar(24);
BEGIN
tablename = 'foo_' || new.partition;
EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER foo_insert_trigger
BEFORE INSERT ON foo
FOR EACH ROW EXECUTE PROCEDURE foo_insert_trigger();

Cheers,

Jeff

Attachments:

insert.pltext/x-perl; charset=US-ASCII; name=insert.plDownload
rules_test.sqltext/x-sql; charset=US-ASCII; name=rules_test.sqlDownload
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#21)
Re: dynamic SQL - possible performance regression in 9.2

Jeff Janes <jeff.janes@gmail.com> writes:

On Wednesday, January 2, 2013, Tom Lane wrote:

That scenario doesn't sound like it has anything to do with the one being
discussed in this thread. But what do you mean by "rule-based
partitioning" exactly? A rule per se wouldn't result in a cached plan
at all, let alone one with parameters, which would be necessary to
trigger any use of the custom-cached-plan code path.

Sorry, when exiled to the hinterlands I have more time to test various
things but not a good enough connectivity to describe them well. I'm
attaching the test case to load 1e5 rows into a very skinny table with 100
partitions using rules.

Ah. I see what's going on: the generic plan has got 100 separate
subplans that look like

Insert on foo_10 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: ($2 = 10)

Insert on foo_11 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: ($2 = 11)

Insert on foo_12 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: ($2 = 12)

Insert on foo_13 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: ($2 = 13)

while a custom plan simplifies that to something like

Insert on foo_10 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false

Insert on foo_11 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)

Insert on foo_12 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false

Insert on foo_13 (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false

(here foo_11 is the actual target). This is indeed a bit faster than
the generic plan, and would be more so if we tried harder to get rid of
no-op subplans entirely ... but it's not enough faster to justify the
extra planning time.

In the particular case at hand, we're getting estimated costs of about
1.01 for a custom plan versus 1.51 for the generic plan. So I think
that whether the 50% ratio is accurate is a bit beside the point --- the
real point is that we're only saving half a cost unit of estimated cost.
So that suggests that choose_custom_plan should impose not only a
minimum fractional savings but also a minimum absolute savings to
justify selecting the custom plan approach. I tried this and verified
that it fixed the problem:

diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/planc
index cbc7c49..1973ed3 100644
*** a/src/backend/utils/cache/plancache.c
--- b/src/backend/utils/cache/plancache.c
*************** choose_custom_plan(CachedPlanSource *pla
*** 990,995 ****
--- 990,997 ----
     */
    if (plansource->generic_cost < avg_custom_cost * 1.1)
        return false;
+   if (plansource->generic_cost < avg_custom_cost + 10)
+       return false;

return true;
}

but of course it's hard to say what that cutoff number ought to be.

Back during the development of the plancache patch we speculated about
how we might take cost-of-planning into account in deciding whether to
use a custom plan or not. The root problem is that we've got no data
about how much work the planner does for a given query and how that
compares to estimated-cost numbers. There was some argument that we
could just take gettimeofday() measurements and drive it off that, but
I don't care for that too much; for one thing it would make the behavior
unstable with variations in system load.

Anyway the bottom line is that we never went back to do the research
about what the policy followed by choose_custom_plan ought to be.
It's probably time to think about that, or at least find a better
stopgap solution than what's in there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers