Excessive memory usage in multi-statement queries w/ partitioning

Started by Andreas Seltenreichover 6 years ago23 messages
#1Andreas Seltenreich
andreas.seltenreich@credativ.de

Hi,

a customer reported excessive memory usage and out-of-memory ERRORs
after introducing native partitioning in one of their databases. We
could narrow it down to the overhead introduced by the partitioning when
issuing multiple statements in a single query. I could reduce the
problem to the following recipe:

--8<---------------cut here---------------start------------->8---
#!/bin/bash

# create 100 partitions
psql -c 'create table t(c int primary key) partition by range(c)'
for i in {1..100}; do
psql -e -c "create table t$i partition of t for values
from ($(((i-1)*100))) to ($((i*100-1))) "
done

# artificially limit per-process memory by setting a resource limit for
# the postmaster to 256MB

prlimit -d$((256*1024*1024)) -p $POSTMASTER_PID
--8<---------------cut here---------------end--------------->8---

Now, updates to a partition are fine with 4000 update statements:

,----
| $ psql -c "$(yes update t2 set c=c where c=6 \; | head -n 4000)"
| UPDATE 0
`----

…but when doing it on the parent relation, even 100 statements are
enough to exceed the limit:

,----
| $ psql -c "$(yes update t set c=c where c=6 \; | head -n 100)"
| FEHLER: Speicher aufgebraucht
| DETAIL: Failed on request of size 200 in memory context "MessageContext".
`----

The memory context dump shows plausible values except for the MessageContext:

TopMemoryContext: 124336 total in 8 blocks; 18456 free (11 chunks); 105880 used
[...]
MessageContext: 264241152 total in 42 blocks; 264 free (0 chunks); 264240888 used
[...]

Maybe some tactically placed pfrees or avoiding putting redundant stuff
into MessageContext can relax the situation?

regards,
Andreas

#2David Rowley
david.rowley@2ndquadrant.com
In reply to: Andreas Seltenreich (#1)
1 attachment(s)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On Thu, 23 May 2019 at 17:55, Andreas Seltenreich
<andreas.seltenreich@credativ.de> wrote:

a customer reported excessive memory usage and out-of-memory ERRORs
after introducing native partitioning in one of their databases. We
could narrow it down to the overhead introduced by the partitioning when
issuing multiple statements in a single query.

"multiple statements in a single query", did you mean to write session
or maybe transaction there?

Which version?

I tried your test case with REL_11_STABLE and I see nowhere near as
much memory used in MessageContext.

After repeating the query twice, I see:

MessageContext: 8388608 total in 11 blocks; 3776960 free (1 chunks);
4611648 used
Grand total: 8388608 bytes in 11 blocks; 3776960 free (1 chunks); 4611648 used
MessageContext: 8388608 total in 11 blocks; 3776960 free (1 chunks);
4611648 used
Grand total: 8388608 bytes in 11 blocks; 3776960 free (1 chunks); 4611648 used

which is quite a long way off the 252MB you're getting.

perhaps I'm not testing with the same version as you are.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

dump_MessageContext_stats.diffapplication/octet-stream; name=dump_MessageContext_stats.diffDownload
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..7b75daf29e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4108,6 +4108,7 @@ PostgresMain(int argc, char *argv[],
 		 */
 		doing_extended_query_message = false;
 
+		MemoryContextStats(MessageContext);
 		/*
 		 * Release storage left over from prior query cycle, and create a new
 		 * query input buffer in the cleared MessageContext.
#3Julian Schauder
julian.schauder@credativ.de
In reply to: David Rowley (#2)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Hey David,

"multiple statements in a single query", did you mean to write
session
or maybe transaction there?

Maybe the wording isn't perfect. It is required that the querys are
sent as a single batch. Try the exact bash-script Andreas used for
updating the parent.

Which version?

Tested including 11.2. Initially found on 11.1. Memory-consumption
Scales somewhat linearly with existing partitions and ';' delimited
Querys per single Batch.

regards
--
Julian Schauder

#4David Rowley
david.rowley@2ndquadrant.com
In reply to: Julian Schauder (#3)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On Thu, 23 May 2019 at 21:19, Julian Schauder
<julian.schauder@credativ.de> wrote:

"multiple statements in a single query", did you mean to write
session
or maybe transaction there?

Maybe the wording isn't perfect. It is required that the querys are
sent as a single batch. Try the exact bash-script Andreas used for
updating the parent.

Thanks for explaining.

Which version?

Tested including 11.2. Initially found on 11.1. Memory-consumption
Scales somewhat linearly with existing partitions and ';' delimited
Querys per single Batch.

Yeah, unfortunately, if the batch contains 100 of those statements
then the planner is going to eat 100 times the memory since it stores
all 100 plans at once.

Since your pruning all but 1 partition then the situation should be
much better for you when you can upgrade to v12. Unfortunately, that's
still about 5 months away.

The best thing you can do for now is going to be either reduce the
number of partitions or reduce the number of statements in the
batch... or install more memory.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andreas Seltenreich (#1)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Hi,

On 2019/05/23 4:15, Andreas Seltenreich wrote:

…but when doing it on the parent relation, even 100 statements are
enough to exceed the limit:

,----
| $ psql -c "$(yes update t set c=c where c=6 \; | head -n 100)"
| FEHLER: Speicher aufgebraucht
| DETAIL: Failed on request of size 200 in memory context "MessageContext".
`----

The memory context dump shows plausible values except for the MessageContext:

TopMemoryContext: 124336 total in 8 blocks; 18456 free (11 chunks); 105880 used
[...]
MessageContext: 264241152 total in 42 blocks; 264 free (0 chunks); 264240888 used
[...]

As David Rowley said, planning that query hundreds of times under a single
MessageContext is not something that will end well on 11.3, because even a
single instance takes up tons of memory that's only released when
MessageContext is reset.

Maybe some tactically placed pfrees or avoiding putting redundant stuff
into MessageContext can relax the situation?

I too have had similar thoughts on the matter. If the planner had built
all its subsidiary data structures in its own private context (or tree of
contexts) which is reset once a plan for a given query is built and passed
on, then there wouldn't be an issue of all of that subsidiary memory
leaking into MessageContext. However, the problem may really be that
we're subjecting the planner to use cases that it wasn't perhaps designed
to perform equally well under -- running it many times while handling the
same message. It is worsened by the fact that the query in question is
something that ought to have been documented as not well supported by the
planner; David has posted a documentation patch for that [1]/messages/by-id/CAKJS1f-2rx+E9mG3xrCVHupefMjAp1+tpczQa9SEOZWyU7fjEA@mail.gmail.com. PG 12 has
alleviated the situation to a large degree, so you won't see the OOM
occurring for this query, but not for all queries unfortunately.

With that said, we may want to look into the planner sometimes hoarding
memory, especially when planning complex queries involving partitions.
AFAIK, one of the reasons for partition-wise join, aggregate to be turned
off by default is that its planning consumes a lot of CPU and memory,
partly because of the fact that planner doesn't actively release the
memory of its subsidiary structures, or maybe because of inferior ways in
which partitions and partitioning properties are represented in the
planner. Though if there's evidence that it's the latter, maybe we should
fix that before pondering any sophisticated planner memory management.

Thanks,
Amit

[1]: /messages/by-id/CAKJS1f-2rx+E9mG3xrCVHupefMjAp1+tpczQa9SEOZWyU7fjEA@mail.gmail.com
/messages/by-id/CAKJS1f-2rx+E9mG3xrCVHupefMjAp1+tpczQa9SEOZWyU7fjEA@mail.gmail.com

#6Joe Conway
mail@joeconway.com
In reply to: Amit Langote (#5)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On 5/24/19 1:47 AM, Amit Langote wrote:

On 2019/05/23 4:15, Andreas Seltenreich wrote:

…but when doing it on the parent relation, even 100 statements are
enough to exceed the limit:

,----
| $ psql -c "$(yes update t set c=c where c=6 \; | head -n 100)"
| FEHLER: Speicher aufgebraucht
| DETAIL: Failed on request of size 200 in memory context "MessageContext".
`----

The memory context dump shows plausible values except for the MessageContext:

TopMemoryContext: 124336 total in 8 blocks; 18456 free (11 chunks); 105880 used
[...]
MessageContext: 264241152 total in 42 blocks; 264 free (0 chunks); 264240888 used
[...]

As David Rowley said, planning that query hundreds of times under a single
MessageContext is not something that will end well on 11.3, because even a
single instance takes up tons of memory that's only released when
MessageContext is reset.

Maybe some tactically placed pfrees or avoiding putting redundant stuff
into MessageContext can relax the situation?

I too have had similar thoughts on the matter. If the planner had built
all its subsidiary data structures in its own private context (or tree of
contexts) which is reset once a plan for a given query is built and passed
on, then there wouldn't be an issue of all of that subsidiary memory
leaking into MessageContext. However, the problem may really be that
we're subjecting the planner to use cases that it wasn't perhaps designed
to perform equally well under -- running it many times while handling the
same message. It is worsened by the fact that the query in question is
something that ought to have been documented as not well supported by the
planner; David has posted a documentation patch for that [1]. PG 12 has
alleviated the situation to a large degree, so you won't see the OOM
occurring for this query, but not for all queries unfortunately.

I admittedly haven't followed this thread too closely, but if having 100
partitions causes out of memory on pg11, that sounds like a massive
regression to me.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#7David Rowley
david.rowley@2ndquadrant.com
In reply to: Joe Conway (#6)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On Sat, 25 May 2019 at 00:18, Joe Conway <mail@joeconway.com> wrote:

I admittedly haven't followed this thread too closely, but if having 100
partitions causes out of memory on pg11, that sounds like a massive
regression to me.

For it to have regressed it would have had to once have been better,
but where was that mentioned? The only thing I saw was
non-partitioned tables compared to partitioned tables, but you can't
really say it's a regression if you're comparing apples to oranges.

I think the only regression here is in the documents from bebc46931a1
having removed the warning about too many partitions in a partitioned
table at the end of ddl.sgml. As Amit mentions, we'd like to put
something back about that.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Joe Conway
mail@joeconway.com
In reply to: David Rowley (#7)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On 5/24/19 9:33 AM, David Rowley wrote:

On Sat, 25 May 2019 at 00:18, Joe Conway <mail@joeconway.com> wrote:

I admittedly haven't followed this thread too closely, but if having 100
partitions causes out of memory on pg11, that sounds like a massive
regression to me.

For it to have regressed it would have had to once have been better,
but where was that mentioned? The only thing I saw was
non-partitioned tables compared to partitioned tables, but you can't
really say it's a regression if you're comparing apples to oranges.

I have very successfully used multiple hundreds and even low thousands
of partitions without running out of memory under the older inheritance
based "partitioning", and declarative partitioning is supposed to be
(and we have advertised it to be) better, not worse, isn't it?

At least from my point of view if 100 partitions is unusable due to
memory leaks it is a regression. Perhaps not *technically* a regression
assuming it behaves this way in pg10 also, but I bet lots of users would
see it that way.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#8)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Joe Conway <mail@joeconway.com> writes:

On 5/24/19 9:33 AM, David Rowley wrote:

For it to have regressed it would have had to once have been better,
but where was that mentioned? The only thing I saw was
non-partitioned tables compared to partitioned tables, but you can't
really say it's a regression if you're comparing apples to oranges.

I have very successfully used multiple hundreds and even low thousands
of partitions without running out of memory under the older inheritance
based "partitioning", and declarative partitioning is supposed to be
(and we have advertised it to be) better, not worse, isn't it?

Have you done the exact thing described in the test case? I think
that's going to be quite unpleasantly memory-intensive in any version.

The real issue here is that we have designed around the assumption
that MessageContext will be used to parse and plan one single statement
before being reset. The described usage breaks that assumption.
No matter how memory-efficient any one statement is or isn't,
if you throw enough of them at the backend without giving it a chance
to reset MessageContext, it won't end well.

So my thought, if we want to do something about this, is not "find
some things we can pfree at the end of planning" but "find a way
to use a separate context for each statement in the query string".
Maybe multi-query strings could be handled by setting up a child
context of MessageContext (after we've done the raw parsing there
and determined that indeed there are multiple queries), running
parse analysis and planning in that context, and resetting that
context after each query.

regards, tom lane

#10Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#9)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On 5/24/19 10:28 AM, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

On 5/24/19 9:33 AM, David Rowley wrote:

For it to have regressed it would have had to once have been better,
but where was that mentioned? The only thing I saw was
non-partitioned tables compared to partitioned tables, but you can't
really say it's a regression if you're comparing apples to oranges.

I have very successfully used multiple hundreds and even low thousands
of partitions without running out of memory under the older inheritance
based "partitioning", and declarative partitioning is supposed to be
(and we have advertised it to be) better, not worse, isn't it?

Have you done the exact thing described in the test case? I think
that's going to be quite unpleasantly memory-intensive in any version.

Ok, fair point. Will test and report back.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#11Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#9)
1 attachment(s)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On 2019/05/24 23:28, Tom Lane wrote:

So my thought, if we want to do something about this, is not "find
some things we can pfree at the end of planning" but "find a way
to use a separate context for each statement in the query string".
Maybe multi-query strings could be handled by setting up a child
context of MessageContext (after we've done the raw parsing there
and determined that indeed there are multiple queries), running
parse analysis and planning in that context, and resetting that
context after each query.

Maybe like the attached? I'm not sure if we need to likewise be concerned
about exec_sql_string() being handed multi-query strings.

Thanks,
Amit

Attachments:

parse-plan-memcxt.patchtext/plain; charset=UTF-8; name=parse-plan-memcxt.patchDownload
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..0165f794b0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -984,7 +984,8 @@ static void
 exec_simple_query(const char *query_string)
 {
 	CommandDest dest = whereToSendOutput;
-	MemoryContext oldcontext;
+	MemoryContext oldcontext,
+				  stmtcontext;
 	List	   *parsetree_list;
 	ListCell   *parsetree_item;
 	bool		save_log_statement_stats = log_statement_stats;
@@ -1132,10 +1133,14 @@ exec_simple_query(const char *query_string)
 		/*
 		 * OK to analyze, rewrite, and plan this query.
 		 *
-		 * Switch to appropriate context for constructing querytrees (again,
-		 * these must outlive the execution context).
+		 * Switch to appropriate context for constructing querytrees.
+		 * Memory allocated during this construction is released before
+		 * the generated plan is executed.
 		 */
-		oldcontext = MemoryContextSwitchTo(MessageContext);
+		stmtcontext = AllocSetContextCreate(MessageContext,
+											"statement parse/plan context",
+											ALLOCSET_DEFAULT_SIZES);
+		oldcontext = MemoryContextSwitchTo(stmtcontext);
 
 		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
 												NULL, 0, NULL);
@@ -1143,6 +1148,14 @@ exec_simple_query(const char *query_string)
 		plantree_list = pg_plan_queries(querytree_list,
 										CURSOR_OPT_PARALLEL_OK, NULL);
 
+		/*
+		 * Copy the plan trees into the longer-lived MessageContext and delete
+		 * the temporary context used to generate them.
+		 */
+		MemoryContextSwitchTo(MessageContext);
+		plantree_list = copyObject(plantree_list);
+		MemoryContextDelete(stmtcontext);
+
 		/* Done with the snapshot used for parsing/planning */
 		if (snapshot_set)
 			PopActiveSnapshot();
#12Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Joe Conway (#6)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Hi,

On 2019/05/24 21:18, Joe Conway wrote:

On 5/24/19 1:47 AM, Amit Langote wrote:

I too have had similar thoughts on the matter. If the planner had built
all its subsidiary data structures in its own private context (or tree of
contexts) which is reset once a plan for a given query is built and passed
on, then there wouldn't be an issue of all of that subsidiary memory
leaking into MessageContext. However, the problem may really be that
we're subjecting the planner to use cases that it wasn't perhaps designed
to perform equally well under -- running it many times while handling the
same message. It is worsened by the fact that the query in question is
something that ought to have been documented as not well supported by the
planner; David has posted a documentation patch for that [1]. PG 12 has
alleviated the situation to a large degree, so you won't see the OOM
occurring for this query, but not for all queries unfortunately.

I admittedly haven't followed this thread too closely, but if having 100
partitions causes out of memory on pg11, that sounds like a massive
regression to me.

You won't run out of memory if you are running just one query per message,
but that's not the case in this discussion. With multi-query submissions
like in this case, memory taken up by parsing and planning of *all*
queries adds up to a single MessageContext, so can lead to OOM if there
are enough queries to load up MessageContext beyond limit. The only point
I was trying to make in what I wrote is that reaching OOM of this sort is
easier with partitioning, because of the age-old behavior that planning
UPDATE/DELETE queries on inherited tables (and so partitioned tables)
needs tons of memory that grows as the number of child tables / partitions
increases.

We fixed things in PG 12, at least for partitioning, so that as long as a
query needs to affect only a small number of partitions of the total
present, its planning will use only a fixed amount of CPU and memory, so
increasing the number of partitions won't lead to explosive growth in
memory used. You might be able to tell however that that effort had
nothing to do improving the situation with multi-query submissions.

Thanks,
Amit

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#11)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2019/05/24 23:28, Tom Lane wrote:

So my thought, if we want to do something about this, is not "find
some things we can pfree at the end of planning" but "find a way
to use a separate context for each statement in the query string".

Maybe like the attached? I'm not sure if we need to likewise be concerned
about exec_sql_string() being handed multi-query strings.

Please add this to the upcoming CF so we don't forget about it.
(I don't think there's anything very new about this behavior, so
I don't feel that we should consider it an open item for v12 ---
anyone think differently?)

regards, tom lane

#14Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#13)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On 2019/05/27 21:56, Tom Lane wrote:

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2019/05/24 23:28, Tom Lane wrote:

So my thought, if we want to do something about this, is not "find
some things we can pfree at the end of planning" but "find a way
to use a separate context for each statement in the query string".

Maybe like the attached? I'm not sure if we need to likewise be concerned
about exec_sql_string() being handed multi-query strings.

Please add this to the upcoming CF so we don't forget about it.

Done; added to Performance for lack of a better topic for this.

https://commitfest.postgresql.org/23/2131/

(I don't think there's anything very new about this behavior, so
I don't feel that we should consider it an open item for v12 ---
anyone think differently?)

Agree that there's nothing new about the behavior itself. What may be new
though is people getting increasingly bitten by it if they query tables
containing large numbers of partitions most of which need to be scanned
[1]: AFAICT, that's the only class of queries where planner needs to keep a lot of stuff around, the memory cost of which increases with the number of partitions. I was thinking that the planning complex queries involving going through tons of indexes, joins, etc. also hoards tons of memory, but apparently not, because the planner seems fairly good at cleaning after itself as it's doing the work.
contains hundreds of such queries to begin with.

Thanks,
Amit

[1]: AFAICT, that's the only class of queries where planner needs to keep a lot of stuff around, the memory cost of which increases with the number of partitions. I was thinking that the planning complex queries involving going through tons of indexes, joins, etc. also hoards tons of memory, but apparently not, because the planner seems fairly good at cleaning after itself as it's doing the work.
lot of stuff around, the memory cost of which increases with the number of
partitions. I was thinking that the planning complex queries involving
going through tons of indexes, joins, etc. also hoards tons of memory, but
apparently not, because the planner seems fairly good at cleaning after
itself as it's doing the work.

#15Julien Rouhaud
rjuju123@gmail.com
In reply to: Amit Langote (#14)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On Tue, May 28, 2019 at 6:57 AM Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2019/05/27 21:56, Tom Lane wrote:

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2019/05/24 23:28, Tom Lane wrote:

So my thought, if we want to do something about this, is not "find
some things we can pfree at the end of planning" but "find a way
to use a separate context for each statement in the query string".

Maybe like the attached? I'm not sure if we need to likewise be concerned
about exec_sql_string() being handed multi-query strings.

the whole extension sql script is passed to execute_sql_string(), so I
think that it's a good thing to have similar workaround there.

About the patch:

 -        * Switch to appropriate context for constructing querytrees (again,
-        * these must outlive the execution context).
+        * Switch to appropriate context for constructing querytrees.
+        * Memory allocated during this construction is released before
+        * the generated plan is executed.

The comment should mention query and plan trees, everything else seems ok to me.

#16Amit Langote
amitlangote09@gmail.com
In reply to: Julien Rouhaud (#15)
1 attachment(s)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Hi Julien,

Thanks for taking a look at this.

On Thu, Jul 4, 2019 at 6:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, May 28, 2019 at 6:57 AM Amit Langote wrote:

Maybe like the attached? I'm not sure if we need to likewise be concerned
about exec_sql_string() being handed multi-query strings.

the whole extension sql script is passed to execute_sql_string(), so I
think that it's a good thing to have similar workaround there.

That makes sense, although it is perhaps much less likely for memory
usage explosion to occur in execute_sql_strings(), because the scripts
passed to execute_sql_strings() mostly contain utility statements and
rarely anything whose planning will explode in memory usage.

Anyway, I've added similar handling in execute_sql_strings() for consistency.

Now I wonder if we'll need to consider another path which calls
pg_plan_queries() on a possibly multi-statement query --
BuildCachedPlan()...

About the patch:

-        * Switch to appropriate context for constructing querytrees (again,
-        * these must outlive the execution context).
+        * Switch to appropriate context for constructing querytrees.
+        * Memory allocated during this construction is released before
+        * the generated plan is executed.

The comment should mention query and plan trees, everything else seems ok to me.

Okay, fixed.

Attached updated patch. Thanks again.

Regards,
Amit

Attachments:

parse-plan-memcxt_v2.patchapplication/octet-stream; name=parse-plan-memcxt_v2.patchDownload
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 59ca5cd5a9..7a5feda2d9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -719,10 +719,24 @@ execute_sql_string(const char *sql)
 		RawStmt    *parsetree = lfirst_node(RawStmt, lc1);
 		List	   *stmt_list;
 		ListCell   *lc2;
+		MemoryContext oldcontext,
+					  stmtcontext;
 
 		/* Be sure parser can see any DDL done so far */
 		CommandCounterIncrement();
 
+		/*
+		 * OK to analyze, rewrite, and plan this raw statement.
+		 *
+		 * Switch to a temporary context before constructing query and plan
+		 * trees.  Memory allocated during this construction will be released
+		 * before executing the generated plan(s).
+		 */
+		stmtcontext = AllocSetContextCreate(CurrentMemoryContext,
+											"statement parse/plan context",
+											ALLOCSET_DEFAULT_SIZES);
+		oldcontext = MemoryContextSwitchTo(stmtcontext);
+
 		stmt_list = pg_analyze_and_rewrite(parsetree,
 										   sql,
 										   NULL,
@@ -730,6 +744,14 @@ execute_sql_string(const char *sql)
 										   NULL);
 		stmt_list = pg_plan_queries(stmt_list, CURSOR_OPT_PARALLEL_OK, NULL);
 
+		/*
+		 * Copy the plan trees into the longer-lived context and delete the
+		 * temporary context used to generate them.
+		 */
+		MemoryContextSwitchTo(oldcontext);
+		stmt_list = copyObject(stmt_list);
+		MemoryContextDelete(stmtcontext);
+
 		foreach(lc2, stmt_list)
 		{
 			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..207a9d6488 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -984,7 +984,8 @@ static void
 exec_simple_query(const char *query_string)
 {
 	CommandDest dest = whereToSendOutput;
-	MemoryContext oldcontext;
+	MemoryContext oldcontext,
+				  stmtcontext;
 	List	   *parsetree_list;
 	ListCell   *parsetree_item;
 	bool		save_log_statement_stats = log_statement_stats;
@@ -1132,10 +1133,14 @@ exec_simple_query(const char *query_string)
 		/*
 		 * OK to analyze, rewrite, and plan this query.
 		 *
-		 * Switch to appropriate context for constructing querytrees (again,
-		 * these must outlive the execution context).
+		 * Switch to a temporary context before constructing query and plan
+		 * trees.  Memory allocated during this construction will be released
+		 * before executing the generated plan(s).
 		 */
-		oldcontext = MemoryContextSwitchTo(MessageContext);
+		stmtcontext = AllocSetContextCreate(MessageContext,
+											"statement parse/plan context",
+											ALLOCSET_DEFAULT_SIZES);
+		oldcontext = MemoryContextSwitchTo(stmtcontext);
 
 		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
 												NULL, 0, NULL);
@@ -1143,6 +1148,14 @@ exec_simple_query(const char *query_string)
 		plantree_list = pg_plan_queries(querytree_list,
 										CURSOR_OPT_PARALLEL_OK, NULL);
 
+		/*
+		 * Copy the plan trees into the longer-lived MessageContext and delete
+		 * the temporary context used to generate them.
+		 */
+		MemoryContextSwitchTo(MessageContext);
+		plantree_list = copyObject(plantree_list);
+		MemoryContextDelete(stmtcontext);
+
 		/* Done with the snapshot used for parsing/planning */
 		if (snapshot_set)
 			PopActiveSnapshot();
#17Julien Rouhaud
rjuju123@gmail.com
In reply to: Amit Langote (#16)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Hi Amit,

On Mon, Jul 8, 2019 at 10:52 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jul 4, 2019 at 6:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, May 28, 2019 at 6:57 AM Amit Langote wrote:

Maybe like the attached? I'm not sure if we need to likewise be concerned
about exec_sql_string() being handed multi-query strings.

the whole extension sql script is passed to execute_sql_string(), so I
think that it's a good thing to have similar workaround there.

That makes sense, although it is perhaps much less likely for memory
usage explosion to occur in execute_sql_strings(), because the scripts
passed to execute_sql_strings() mostly contain utility statements and
rarely anything whose planning will explode in memory usage.

Anyway, I've added similar handling in execute_sql_strings() for consistency.

Thanks!

Now I wonder if we'll need to consider another path which calls
pg_plan_queries() on a possibly multi-statement query --
BuildCachedPlan()...

I also thought about this when reviewing the patch, but AFAICS you
can't provide a multi-statement query to BuildCachedPlan() using
prepared statements and I'm not sure that SPI is worth the trouble.
I'll mark this patch as ready for committer.

Show quoted text

About the patch:

-        * Switch to appropriate context for constructing querytrees (again,
-        * these must outlive the execution context).
+        * Switch to appropriate context for constructing querytrees.
+        * Memory allocated during this construction is released before
+        * the generated plan is executed.

The comment should mention query and plan trees, everything else seems ok to me.

Okay, fixed.

Attached updated patch. Thanks again.

Regards,
Amit

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#16)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Amit Langote <amitlangote09@gmail.com> writes:

[ parse-plan-memcxt_v2.patch ]

I got around to looking at this finally. I'm not at all happy with
the fact that it's added a plantree copy step to the only execution
path through exec_simple_query. That's a very significant overhead,
in many use-cases, to solve something that nobody had complained
about for a couple of decades before now. I don't see the need for
any added copy step anyway. The only reason you're doing it AFAICS
is so you can release the per-statement context a bit earlier, which
is a completely unnecessary optimization. Just wait to release it
till the bottom of the loop.

Also, creating/deleting the sub-context is in itself an added overhead
that accomplishes exactly nothing in the typical case where there's
not multiple statements. I thought the idea was to do that only if
there was more than one raw parsetree (or, maybe better, do it for
all but the last parsetree).

To show that this isn't an empty concern, I did a quick pgbench
test. Using a single-client select-only test ("pgbench -S -T 60"
in an -s 10 database), I got these numbers in three trials with HEAD:

tps = 9593.818478 (excluding connections establishing)
tps = 9570.189163 (excluding connections establishing)
tps = 9596.579038 (excluding connections establishing)

and these numbers after applying the patch:

tps = 9411.918165 (excluding connections establishing)
tps = 9389.279079 (excluding connections establishing)
tps = 9409.350175 (excluding connections establishing)

That's about a 2% dropoff. Now it's possible that that can be
explained away as random variations from a slightly different layout
of critical loops vs cacheline boundaries ... but I don't believe it.

regards, tom lane

#19Amit Langote
amitlangote09@gmail.com
In reply to: Tom Lane (#18)
1 attachment(s)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On Tue, Jul 9, 2019 at 6:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

[ parse-plan-memcxt_v2.patch ]

I got around to looking at this finally.

Thanks for the review.

I'm not at all happy with
the fact that it's added a plantree copy step to the only execution
path through exec_simple_query. That's a very significant overhead,
in many use-cases, to solve something that nobody had complained
about for a couple of decades before now. I don't see the need for
any added copy step anyway. The only reason you're doing it AFAICS
is so you can release the per-statement context a bit earlier, which
is a completely unnecessary optimization. Just wait to release it
till the bottom of the loop.

Ah, that makes sense. I've removed the copying of plan tree and also
moved the temporary context deletion to the bottom of the loop.

Also, creating/deleting the sub-context is in itself an added overhead
that accomplishes exactly nothing in the typical case where there's
not multiple statements. I thought the idea was to do that only if
there was more than one raw parsetree (or, maybe better, do it for
all but the last parsetree).

That makes sense too. I've made it (creation/deletion of the child
context) conditional on whether there are more than one queries to
plan.

To show that this isn't an empty concern, I did a quick pgbench
test. Using a single-client select-only test ("pgbench -S -T 60"
in an -s 10 database), I got these numbers in three trials with HEAD:

tps = 9593.818478 (excluding connections establishing)
tps = 9570.189163 (excluding connections establishing)
tps = 9596.579038 (excluding connections establishing)

and these numbers after applying the patch:

tps = 9411.918165 (excluding connections establishing)
tps = 9389.279079 (excluding connections establishing)
tps = 9409.350175 (excluding connections establishing)

That's about a 2% dropoff.

With the updated patch, here are the numbers on my machine (HEAD vs patch)

HEAD:

tps = 3586.233815 (excluding connections establishing)
tps = 3569.252542 (excluding connections establishing)
tps = 3559.027733 (excluding connections establishing)

Patched:

tps = 3586.988057 (excluding connections establishing)
tps = 3585.169589 (excluding connections establishing)
tps = 3526.437968 (excluding connections establishing)

A bit noisy but not much degradation.

Attached updated patch. Thanks again.

Regards,
Amit

Attachments:

parse-plan-memcxt_v3.patchapplication/octet-stream; name=parse-plan-memcxt_v3.patchDownload
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 59ca5cd5a9..18dd22691c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -719,10 +719,24 @@ execute_sql_string(const char *sql)
 		RawStmt    *parsetree = lfirst_node(RawStmt, lc1);
 		List	   *stmt_list;
 		ListCell   *lc2;
+		MemoryContext oldcontext = CurrentMemoryContext,
+					  plancontext = NULL;
 
 		/* Be sure parser can see any DDL done so far */
 		CommandCounterIncrement();
 
+		/*
+		 * If there are more queries to run, use a temporary child context
+		 * that will be reset after executing this query.
+		 */
+		if (lnext(lc1) != NULL)
+		{
+			plancontext = AllocSetContextCreate(CurrentMemoryContext,
+												"statement planning context",
+												ALLOCSET_DEFAULT_SIZES);
+			MemoryContextSwitchTo(plancontext);
+		}
+
 		stmt_list = pg_analyze_and_rewrite(parsetree,
 										   sql,
 										   NULL,
@@ -730,6 +744,10 @@ execute_sql_string(const char *sql)
 										   NULL);
 		stmt_list = pg_plan_queries(stmt_list, CURSOR_OPT_PARALLEL_OK, NULL);
 
+		/* Switch context for execution. */
+		if (plancontext)
+			MemoryContextSwitchTo(oldcontext);
+
 		foreach(lc2, stmt_list)
 		{
 			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
@@ -772,6 +790,13 @@ execute_sql_string(const char *sql)
 
 			PopActiveSnapshot();
 		}
+
+		/*
+		 * Delete the planning context unless this is the last statement, in
+		 * which case, deleting the parent context will get the job done.
+		 */
+		if (plancontext)
+			MemoryContextDelete(plancontext);
 	}
 
 	/* Be sure to advance the command counter after the last script command */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..67b46bff8a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1075,6 +1075,7 @@ exec_simple_query(const char *query_string)
 		Portal		portal;
 		DestReceiver *receiver;
 		int16		format;
+		MemoryContext plancontext = NULL;
 
 		/*
 		 * Get the command name for use in status display (it also becomes the
@@ -1132,10 +1133,22 @@ exec_simple_query(const char *query_string)
 		/*
 		 * OK to analyze, rewrite, and plan this query.
 		 *
-		 * Switch to appropriate context for constructing querytrees (again,
-		 * these must outlive the execution context).
+		 * Switch to appropriate context for constructing query and plan trees
+		 * (again, these must outlive the execution context).  Normally, it's
+		 * MessageContext, but if there are more queries to plan, we use a
+		 * temporary child context that will be reset after executing this
+		 * query.  We avoid that overhead of setting up a separate context
+		 * for the common case of having just a single query.
 		 */
-		oldcontext = MemoryContextSwitchTo(MessageContext);
+		if (lnext(parsetree_item) != NULL)
+		{
+			plancontext = AllocSetContextCreate(MessageContext,
+												"statement planning context",
+												ALLOCSET_DEFAULT_SIZES);
+			oldcontext = MemoryContextSwitchTo(plancontext);
+		}
+		else
+			oldcontext = MemoryContextSwitchTo(MessageContext);
 
 		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
 												NULL, 0, NULL);
@@ -1143,6 +1156,10 @@ exec_simple_query(const char *query_string)
 		plantree_list = pg_plan_queries(querytree_list,
 										CURSOR_OPT_PARALLEL_OK, NULL);
 
+		/* Switch to MessageContext for creating the portal. */
+		if (plancontext)
+			MemoryContextSwitchTo(MessageContext);
+
 		/* Done with the snapshot used for parsing/planning */
 		if (snapshot_set)
 			PopActiveSnapshot();
@@ -1263,6 +1280,10 @@ exec_simple_query(const char *query_string)
 		 * aborted by error will not send an EndCommand report at all.)
 		 */
 		EndCommand(completionTag, dest);
+
+		/* Delete the planning context if one was created. */
+		if (plancontext)
+			MemoryContextDelete(plancontext);
 	}							/* end loop over parsetrees */
 
 	/*
#20Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Langote (#19)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Hi,

At Wed, 10 Jul 2019 16:35:18 +0900, Amit Langote <amitlangote09@gmail.com> wrote in <CA+HiwqFCO4c8tdQmXcDNzyaD43A81caapYLJ6CEh8H3P0tPL4A@mail.gmail.com>

On Tue, Jul 9, 2019 at 6:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

[ parse-plan-memcxt_v2.patch ]

I got around to looking at this finally.

Thanks for the review.

I'm not at all happy with
the fact that it's added a plantree copy step to the only execution
path through exec_simple_query. That's a very significant overhead,
in many use-cases, to solve something that nobody had complained
about for a couple of decades before now. I don't see the need for
any added copy step anyway. The only reason you're doing it AFAICS
is so you can release the per-statement context a bit earlier, which
is a completely unnecessary optimization. Just wait to release it
till the bottom of the loop.

Ah, that makes sense. I've removed the copying of plan tree and also
moved the temporary context deletion to the bottom of the loop.

-     * Switch to appropriate context for constructing querytrees (again,
-     * these must outlive the execution context).
+     * Switch to appropriate context for constructing query and plan trees
+     * (again, these must outlive the execution context).  Normally, it's
+     * MessageContext, but if there are more queries to plan, we use a
+     * temporary child context that will be reset after executing this
+     * query.  We avoid that overhead of setting up a separate context
+     * for the common case of having just a single query.

Might be stupid, but I feel uneasy that "usually it must live in
MessageContxt, but not necessarily if there is succeeding
query".. *I* need more explanation why it is safe to use that
short-lived context.

Also, creating/deleting the sub-context is in itself an added overhead
that accomplishes exactly nothing in the typical case where there's
not multiple statements. I thought the idea was to do that only if
there was more than one raw parsetree (or, maybe better, do it for
all but the last parsetree).

That makes sense too. I've made it (creation/deletion of the child
context) conditional on whether there are more than one queries to
plan.

To show that this isn't an empty concern, I did a quick pgbench
test. Using a single-client select-only test ("pgbench -S -T 60"
in an -s 10 database), I got these numbers in three trials with HEAD:

tps = 9593.818478 (excluding connections establishing)
tps = 9570.189163 (excluding connections establishing)
tps = 9596.579038 (excluding connections establishing)

and these numbers after applying the patch:

tps = 9411.918165 (excluding connections establishing)
tps = 9389.279079 (excluding connections establishing)
tps = 9409.350175 (excluding connections establishing)

That's about a 2% dropoff.

With the updated patch, here are the numbers on my machine (HEAD vs patch)

HEAD:

tps = 3586.233815 (excluding connections establishing)
tps = 3569.252542 (excluding connections establishing)
tps = 3559.027733 (excluding connections establishing)

Patched:

tps = 3586.988057 (excluding connections establishing)
tps = 3585.169589 (excluding connections establishing)
tps = 3526.437968 (excluding connections establishing)

A bit noisy but not much degradation.

Attached updated patch. Thanks again.

Regards,
Amit

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#19)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Amit Langote <amitlangote09@gmail.com> writes:

Attached updated patch. Thanks again.

Pushed with a bit of further cleanup --- most notably, the way
you had execute_sql_string(), it was still leaking any cruft
ProcessUtility might generate. We can fix that by running
ProcessUtility in the per-statement context too.

I also dropped the optimization for a single/last statement in
execute_sql_string(), and simplified it to just always create
and delete a child context. This was based on a couple of
thoughts. The norm in this code path is that there's multiple
statements, probably lots of them, so that the percentage savings
from getting rid of one context creation is likely negligible.
Also, unlike exec_simple_query, we *don't* know that the outer
context is due to be cleared right afterwards. Since
execute_sql_string() can run multiple times in one extension
command, in principle we could get bloat from not cleaning up
after the last command of each string. Admittedly, it's not
likely that you'd have so many strings involved that that
amounts to a lot, but between following upgrade-script chains
and cascaded module loads, there could be more than a couple.
So it seems like the argument for saving a context creation is
much weaker here than in exec_simple_query.

I tried to improve the comments too. I noticed that the bit about
"(again, these must outlive the execution context)" seemed to be
a dangling reference --- whatever previous comment it was referring
to is not to be found anymore. So I made that self-contained.

regards, tom lane

#22Amit Langote
amitlangote09@gmail.com
In reply to: Tom Lane (#21)
Re: Excessive memory usage in multi-statement queries w/ partitioning

On Thu, Jul 11, 2019 at 3:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Attached updated patch. Thanks again.

Pushed with a bit of further cleanup

Thanks a lot.

--- most notably, the way
you had execute_sql_string(), it was still leaking any cruft
ProcessUtility might generate.  We can fix that by running
ProcessUtility in the per-statement context too.

Ah, I was thinking only about planning.

I also dropped the optimization for a single/last statement in
execute_sql_string(), and simplified it to just always create
and delete a child context. This was based on a couple of
thoughts. The norm in this code path is that there's multiple
statements, probably lots of them, so that the percentage savings
from getting rid of one context creation is likely negligible.
Also, unlike exec_simple_query, we *don't* know that the outer
context is due to be cleared right afterwards. Since
execute_sql_string() can run multiple times in one extension
command, in principle we could get bloat from not cleaning up
after the last command of each string. Admittedly, it's not
likely that you'd have so many strings involved that that
amounts to a lot, but between following upgrade-script chains
and cascaded module loads, there could be more than a couple.
So it seems like the argument for saving a context creation is
much weaker here than in exec_simple_query.

Agreed.

I tried to improve the comments too. I noticed that the bit about
"(again, these must outlive the execution context)" seemed to be
a dangling reference --- whatever previous comment it was referring
to is not to be found anymore. So I made that self-contained.

Thanks.

Regards,
Amit

#23Amit Langote
amitlangote09@gmail.com
In reply to: Kyotaro Horiguchi (#20)
Re: Excessive memory usage in multi-statement queries w/ partitioning

Horiguchi-san,

Thanks for the comment. My reply is a bit late now, but....

On Wed, Jul 10, 2019 at 5:39 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Wed, 10 Jul 2019 16:35:18 +0900, Amit Langote <amitlangote09@gmail.com> wrote:
-     * Switch to appropriate context for constructing querytrees (again,
-     * these must outlive the execution context).
+     * Switch to appropriate context for constructing query and plan trees
+     * (again, these must outlive the execution context).  Normally, it's
+     * MessageContext, but if there are more queries to plan, we use a
+     * temporary child context that will be reset after executing this
+     * query.  We avoid that overhead of setting up a separate context
+     * for the common case of having just a single query.

Might be stupid, but I feel uneasy that "usually it must live in
MessageContxt, but not necessarily if there is succeeding
query".. *I* need more explanation why it is safe to use that
short-lived context.

So the problem we're trying solve with this is that memory consumed
when parsing/planning individual statements pile up in a single
context (currently, MessageContext), which can lead to severe bloat
especially if the planning of individual statements consumes huge
amount of memory. The solution is to use a sub-context of
MessageContext for each statement that's reset when its execution is
finished. I think it's safe to use a shorter-lived context for each
statement because the memory of a given statement should not need to
be referenced when its execution finishes. Do you see any problem
with that assumption?

Thanks,
Amit