Adding column "mem_usage" to view pg_prepared_statements

Started by Daniel Migowskiover 6 years ago18 messages

dmigowski@ikoffice.de

over 6 years ago

Hello,

I just implemented a small change that adds another column "mem_usage" to the system view "pg_prepared_statements". It returns the memory usage total of CachedPlanSource.context, CachedPlanSource.query_content and if available CachedPlanSource.gplan.context.

Looks like this:

IKOffice_Daume=# prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1;
PREPARE
IKOffice_Daume=# select * from pg_prepared_statements;
name | statement | prepare_time | parameter_types | from_sql | mem_usage
------+----------------------------------------------------------------------------------+------------------------------+-----------------+----------+-----------
test | prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1; | 2019-07-27 20:21:12.63093+02 | {integer} | t | 33580232
(1 row)

I did this in preparation of reducing the memory usage of prepared statements and believe that this gives client application an option to investigate which prepared statements should be dropped. Also this makes it possible to directly examine the results of further changes and their effectiveness on reducing the memory load of prepared_statements.

Is a patch welcome or is this feature not of interest?

Also I wonder why the "prepare test as" is part of the statement column. I isn't even part of the real statement that is prepared as far as I would assume. Would prefer to just have the "select *..." in that column.

Kind regards,
Daniel Migowski

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Daniel Migowski (#1)

Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:

I just implemented a small change that adds another column "mem_usage"
to the system view "pg_prepared_statements". It returns the memory
usage total of CachedPlanSource.context,
CachedPlanSource.query_content and if available
CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the
patch, even if it's just POC, as that gives a better handle on how much
additional complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied
to just cached statements however - perhaps we ought to generalize it a
bit more.

Regarding the prepared statements specific considerations: I don't think
we ought to explicitly reference CachedPlanSource.query_content, and
CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans)
CachedPlanSource.query_context IIRC should live under
CachedPlanSource.context. I think there's no relevant cases where
gplan.context isn't a child of CachedPlanSource.context either, but not
quite sure.

Then we ought to just include child contexts in the memory computation
(cf. logic in MemoryContextStatsInternal(), although you obviously
wouldn't need all that). That way, if the cached statements has child
contexts, we're going to stay accurate.

Also I wonder why the "prepare test as" is part of the statement
column. I isn't even part of the real statement that is prepared as
far as I would assume. Would prefer to just have the "select *..." in
that column.

It's the statement that was executed. Note that you'll not see that in
the case of protocol level prepared statements. It will sometimes
include relevant information, e.g. about the types specified as part of
the prepare (as in PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund

Daniel Migowski

dmigowski@ikoffice.de

over 6 years ago

In reply to: Andres Freund (#2)

1 attachment(s)

Re: Adding column "mem_usage" to view pg_prepared_statements

Hello Andres,

how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different objects? Like functions or views (that also have a plan associated with it, when I think about it)? While being interesting I still believe monitoring the mem usage of prepared statements is a bit more important than that of other objects because of how they change memory consumption of the server without using any DDL or configuration options and I am not aware of other objects with the same properties, or are there some? And for the other volatile objects like tables and indexes and their contents PostgreSQL already has it's information functions.

Regardless of that here is the patch for now. I didn't want to fiddle to much with MemoryContexts yet, so it still doesn't recurse in child contexts, but I will change that also when I try to build a more compact MemoryContext implementation and see how that works out.

Thanks for pointing out the relevant information in the statement column of the view.

Regards,
Daniel Migowski

-----Ursprüngliche Nachricht-----
Von: Andres Freund <andres@anarazel.de>
Gesendet: Samstag, 27. Juli 2019 21:12
An: Daniel Migowski <dmigowski@ikoffice.de>
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:

I just implemented a small change that adds another column "mem_usage"
to the system view "pg_prepared_statements". It returns the memory
usage total of CachedPlanSource.context,
CachedPlanSource.query_content and if available
CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better handle on how much additional complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we ought to generalize it a bit more.

Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference CachedPlanSource.query_content, and CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. I think there's no relevant cases where gplan.context isn't a child of CachedPlanSource.context either, but not quite sure.

Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(), although you obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to stay accurate.

Also I wonder why the "prepare test as" is part of the statement
column. I isn't even part of the real statement that is prepared as
far as I would assume. Would prefer to just have the "select *..." in
that column.

It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements. It will sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund

Attachments:

prepared_statements_mem_usage.diffapplication/octet-stream; name=prepared_statements_mem_usage.diffDownload

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0fef0ca..52ebc2d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9525,6 +9525,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        frontend/backend protocol
       </entry>
      </row>
+     <row>
+      <entry><structfield>mem_usage</structfield></entry>
+      <entry><type>int8</type></entry>
+      <entry>
+       The memory reserved in the backend for the prepared statement.
+      </entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index b945b15..1b24254 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -35,6 +35,7 @@
 #include "utils/builtins.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
+#include "utils/plancache.h"
 
 
 /*
@@ -704,7 +705,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 /*
  * This set returning function reads all the prepared statements and
- * returns a set of (name, statement, prepare_time, param_types, from_sql).
+ * returns a set of (name, statement, prepare_time, param_types, from_sql, mem_usage).
  */
 Datum
 pg_prepared_statement(PG_FUNCTION_ARGS)
@@ -734,7 +735,7 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 	 * build tupdesc for result tuples. This must match the definition of the
 	 * pg_prepared_statements view in system_views.sql
 	 */
-	tupdesc = CreateTemplateTupleDesc(5, false);
+	tupdesc = CreateTemplateTupleDesc(6, false);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "name",
 					   TEXTOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "statement",
@@ -745,6 +746,9 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 					   REGTYPEARRAYOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "from_sql",
 					   BOOLOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "mem_usage",
+					   INT8OID, -1, 0);
+
 
 	/*
 	 * We put all the tuples into a tuplestore in one scan of the hashtable.
@@ -766,8 +770,8 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 		hash_seq_init(&hash_seq, prepared_queries);
 		while ((prep_stmt = hash_seq_search(&hash_seq)) != NULL)
 		{
-			Datum		values[5];
-			bool		nulls[5];
+			Datum		values[6];
+			bool		nulls[6];
 
 			MemSet(nulls, 0, sizeof(nulls));
 
@@ -777,6 +781,7 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 			values[3] = build_regtype_array(prep_stmt->plansource->param_types,
 											prep_stmt->plansource->num_params);
 			values[4] = BoolGetDatum(prep_stmt->from_sql);
+			values[5] = Int8GetDatum((int64)CachedPlanMemoryUsage(prep_stmt->plansource));
 
 			tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 		}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0ad3e3c..291b3f3 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1451,6 +1451,42 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
 }
 
 /*
+ * CachedPlanMemUsage: Return memory used by the CachedPlanSource
+ *
+ * Returns the malloced memory used by the two MemoryContexts in
+ * CachedPlanSource and (if available) the MemoryContext in the generic plan.
+ * Does not care for the free memory in those MemoryContexts because it is very
+ * unlikely that it is reused for anythink else anymore and can be considered
+ * dead memory anyway. Also the size of the CachedPlanSource struct is added.
+ *
+ * This function is used only for the pg_prepared_statements view to allow
+ * client applications to monitor memory used by prepared statements and to
+ * selects candidates for eviction in memory contraint environments with
+ * automatic preparation of often called queries.
+ */
+Size
+CachedPlanMemoryUsage(CachedPlanSource *plan)
+{
+	MemoryContextCounters counters;
+	MemoryContext context;
+
+	counters.totalspace = 0;
+
+	context = plan->context;
+	context->methods->stats(context,NULL,NULL,&counters);
+
+	context = plan->query_context;
+	context->methods->stats(context,NULL,NULL,&counters);
+
+	if( plan->gplan ) {
+		context = plan->gplan->context;
+		context->methods->stats(context,NULL,NULL,&counters);
+	}
+
+	return counters.totalspace;
+}
+
+/*
  * QueryListGetPrimaryStmt
  *		Get the "primary" stmt within a list, ie, the one marked canSetTag.
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c4fc50d..de0277e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7633,9 +7633,9 @@
 { oid => '2510', descr => 'get the prepared statements for this session',
   proname => 'pg_prepared_statement', prorows => '1000', proretset => 't',
   provolatile => 's', proparallel => 'r', prorettype => 'record',
-  proargtypes => '', proallargtypes => '{text,text,timestamptz,_regtype,bool}',
-  proargmodes => '{o,o,o,o,o}',
-  proargnames => '{name,statement,prepare_time,parameter_types,from_sql}',
+  proargtypes => '', proallargtypes => '{text,text,timestamptz,_regtype,bool,int8}',
+  proargmodes => '{o,o,o,o,o,0}',
+  proargnames => '{name,statement,prepare_time,parameter_types,from_sql,mem_usage}',
   prosrc => 'pg_prepared_statement' },
 { oid => '2511', descr => 'get the open cursors for this session',
   proname => 'pg_cursor', prorows => '1000', proretset => 't',
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index ab20aa0..4517f6a 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -182,4 +182,6 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
 			  QueryEnvironment *queryEnv);
 extern void ReleaseCachedPlan(CachedPlan *plan, bool useResOwner);
 
+extern Size CachedPlanMemoryUsage(CachedPlanSource *plansource);
+
 #endif							/* PLANCACHE_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7e2b194..e2abdaf 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1425,8 +1425,9 @@ pg_prepared_statements| SELECT p.name,
     p.statement,
     p.prepare_time,
     p.parameter_types,
-    p.from_sql
-   FROM pg_prepared_statement() p(name, statement, prepare_time, parameter_types, from_sql);
+    p.from_sql,
+    p.mem_usage
+   FROM pg_prepared_statement() p(name, statement, prepare_time, parameter_types, from_sql, mem_usage);
 pg_prepared_xacts| SELECT p.transaction,
     p.gid,
     p.prepared,

Import Notes

Resolved by subject fallback

Daniel Migowski

dmigowski@ikoffice.de

over 6 years ago

In reply to: Daniel Migowski (#3)

AW: Adding column "mem_usage" to view pg_prepared_statements

Hello,

Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because the system catalog is changed I would love to add it before 12.0 becomes stable.

Regards,
Daniel Migowski

-----Ursprüngliche Nachricht-----
Von: Daniel Migowski <dmigowski@ikoffice.de>
Gesendet: Sonntag, 28. Juli 2019 08:21
An: Andres Freund <andres@anarazel.de>
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hello Andres,

Thanks for pointing out the relevant information in the statement column of the view.

Regards,
Daniel Migowski

Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:

I just implemented a small change that adds another column "mem_usage"
to the system view "pg_prepared_statements". It returns the memory
usage total of CachedPlanSource.context,
CachedPlanSource.query_content and if available
CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better handle on how much additional complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we ought to generalize it a bit more.

Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference CachedPlanSource.query_content, and CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. I think there's no relevant cases where gplan.context isn't a child of CachedPlanSource.context either, but not quite sure.

Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(), although you obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to stay accurate.

Also I wonder why the "prepare test as" is part of the statement
column. I isn't even part of the real statement that is prepared as
far as I would assume. Would prefer to just have the "select *..." in
that column.

It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements. It will sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Daniel Migowski (#4)

Re: AW: Adding column "mem_usage" to view pg_prepared_statements

Daniel Migowski <dmigowski@ikoffice.de> writes:

Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because the system catalog is changed I would love to add it before 12.0 becomes stable.

v12 has been feature-frozen for months, and it's pretty hard to paint
this as a bug fix, so I'd say the answer is "no".

regards, tom lane

Daniel Migowski

dmigowski@ikoffice.de

over 6 years ago

In reply to: Tom Lane (#5)

AW: AW: Adding column "mem_usage" to view pg_prepared_statements

Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage column here? If it was, I really would add some memory monitoring in our app regarding this.

-----Ursprüngliche Nachricht-----
Von: Tom Lane <tgl@sss.pgh.pa.us>
Gesendet: Mittwoch, 31. Juli 2019 00:09
An: Daniel Migowski <dmigowski@ikoffice.de>
Cc: Andres Freund <andres@anarazel.de>; pgsql-hackers@lists.postgresql.org
Betreff: Re: AW: Adding column "mem_usage" to view pg_prepared_statements

Daniel Migowski <dmigowski@ikoffice.de> writes:

Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because the system catalog is changed I would love to add it before 12.0 becomes stable.

v12 has been feature-frozen for months, and it's pretty hard to paint this as a bug fix, so I'd say the answer is "no".

regards, tom lane

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Daniel Migowski (#4)

Re: Adding column "mem_usage" to view pg_prepared_statements

On Tue, Jul 30, 2019 at 10:01:09PM +0000, Daniel Migowski wrote:

Hello,

Will my patch be considered for 12.0? The calculation of the mem_usage
value might be improved later on but because the system catalog is
changed I would love to add it before 12.0 becomes stable.

Nope. Code freeze for PG12 data was April 7, 2019. You're a bit late for
that, unfortunately. We're in PG13 land now.

FWIW not sure what mail client you're using, but it seems to be breaking
the threads for some reason, splitting it into two - see [1]/messages/by-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de.

Also, please stop top-posting. It makes it way harder to follow the
discussion.

[1]: /messages/by-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Daniel Migowski (#6)

Re: AW: AW: Adding column "mem_usage" to view pg_prepared_statements

Daniel Migowski <dmigowski@ikoffice.de> writes:

Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage column here? If it was, I really would add some memory monitoring in our app regarding this.

You should certainly put it into the next commitfest. We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it. (The CF app is really a to-do
list for patches ...)

regards, tom lane

Daniel Migowski

dmigowski@ikoffice.de

over 6 years ago

In reply to: Tomas Vondra (#7)

Re: Adding column "mem_usage" to view pg_prepared_statements

Am 31.07.2019 um 00:17 schrieb Tomas Vondra:

FWIW not sure what mail client you're using, but it seems to be breaking
the threads for some reason, splitting it into two - see [1].

Also, please stop top-posting. It makes it way harder to follow the
discussion.

Was using Outlook because it's my companies mail client but switching to
Thunderbird now for communication with the list, trying to be a good
citizen now.

Regards,
Daniel Migowski

#10

Daniel Migowski

dmigowski@ikoffice.de

over 6 years ago

In reply to: Tom Lane (#8)

Re: Adding column "mem_usage" to view pg_prepared_statements

Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski <dmigowski@ikoffice.de> writes:

Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage column here? If it was, I really would add some memory monitoring in our app regarding this.

You should certainly put it into the next commitfest. We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it. (The CF app is really a to-do
list for patches ...)

OK, added it there. Thanks for your patience :).

Regards,
Daniel Migowski

#11

David Fetter

david@fetter.org

over 6 years ago

In reply to: Daniel Migowski (#4)

Re: Adding column "mem_usage" to view pg_prepared_statements

On Tue, Jul 30, 2019 at 10:01:09PM +0000, Daniel Migowski wrote:

Hello,

Will my patch be considered for 12.0? The calculation of the
mem_usage value might be improved later on but because the system
catalog is changed I would love to add it before 12.0 becomes
stable.

Feature freeze was April 8, so no.

https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg37059.html

You're in plenty of time for 13, though!

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#12

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 6 years ago

In reply to: Daniel Migowski (#10)

Re: Adding column "mem_usage" to view pg_prepared_statements

On 31.07.2019 1:38, Daniel Migowski wrote:

Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski <dmigowski@ikoffice.de> writes:

Ok, just have read about the commitfest thing. Is the patch OK for
that? Or is there generally no love for a mem_usage column here? If
it was, I really would add some memory monitoring in our app
regarding this.

You should certainly put it into the next commitfest. We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it. (The CF app is really a to-do
list for patches ...)

OK, added it there. Thanks for your patience :).

Regards,
Daniel Migowski

The patch is not applied to the most recent source because extra
parameter was added to CreateTemplateTupleDesc method.
Please rebase it - the fix is trivial.

I think that including in pg_prepared_statements information about
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view, but
for example it is also need in my autoprepare patch.

I wonder if you consider go further and not only report but control
memory used by prepared statements?
For example implement some LRU replacement discipline on top of prepared
statements cache which can
evict rarely used prepared statements to avoid memory overflow.
We have such patch for PgPro-EE but it limits only number of prepared
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it adds
more overhead).

If you want, I can be reviewer of your patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#13

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Daniel Migowski (#3)

Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:

how do you want to generalize it? Are you thinking about a view solely
for the display of the memory usage of different objects?

I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...

I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.

The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used, context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used, context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.

Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.

One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.

While being interesting I still believe monitoring the mem usage of
prepared statements is a bit more important than that of other objects
because of how they change memory consumption of the server without
using any DDL or configuration options and I am not aware of other
objects with the same properties, or are there some? And for the other
volatile objects like tables and indexes and their contents PostgreSQL
already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).

Greetings,

Andres Freund

#14

Daniel Migowski

dmigowski@ikoffice.de

over 6 years ago

In reply to: Konstantin Knizhnik (#12)

Re: Adding column "mem_usage" to view pg_prepared_statements

Am 05.08.2019 um 18:30 schrieb Konstantin Knizhnik:

On 31.07.2019 1:38, Daniel Migowski wrote:

Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski <dmigowski@ikoffice.de> writes:

Ok, just have read about the commitfest thing. Is the patch OK for
that? Or is there generally no love for a mem_usage column here? If
it was, I really would add some memory monitoring in our app
regarding this.

You should certainly put it into the next commitfest. We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it. (The CF app is really a to-do
list for patches ...)

OK, added it there. Thanks for your patience :).

The patch is not applied to the most recent source because extra
parameter was added to CreateTemplateTupleDesc method.
Please rebase it - the fix is trivial.

OK, will do.

I think that including in pg_prepared_statements information about
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view,
but for example it is also need in my autoprepare patch.

I would love to use your work if it's done, and would also love to work
together here. I am quite novice in C thought, I might take my time to
get things right.

I wonder if you consider go further and not only report but control
memory used by prepared statements?
For example implement some LRU replacement discipline on top of
prepared statements cache which can
evict rarely used prepared statements to avoid memory overflow.

THIS! Having some kind of safety net here would finally make sure that
my precious processes will not grow endlessly until all mem is eaten up,
even with prep statement count limits.

While working on stuff I noticed there are three things stored in a
CachedPlanSource. The raw query tree (a relatively small thing), the
query list (analyzed-and-rewritten query tree) which takes up the most
memory (at least here, maybe different with your usecases), and (often
after the 6th call) the CachedPlan, which is more optimized that the
query list and often needs less memory (half of the query list here).

The query list seems to take the most time to create here, because I hit
the GEQO engine here, but it could be recreated easily (up to 500ms for
some queries). Creating the CachedPlan afterwards takes 60ms in some
usecase. IF we could just invalidate them from time to time, that would
be grate.

Also, invalidating just the queries or the CachedPlan would not
invalidate the whole prepared statement, which would break clients
expectations, but just make them a slower, adding much to the stability
of the system. I would pay that price, because I just don't use manually
named prepared statements anyway and just autogenerate them as
performance sugar without thinking about what really needs to be
prepared anyway. There is an option in the JDBC driver to use prepared
statements automatically after you have used them a few time.

We have such patch for PgPro-EE but it limits only number of prepared
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it
adds more overhead).

Limiting them by number is already done automatically here and would
really not be of much value, but having a mem limit would be great. We
could have a combined memory limit for your autoprepared statements as
well as the manually prepared ones, so clients can know for sure the
server processes won't eat up more that e.g. 800MB for prepared
statements. And also I would like to have this value spread across all
client processes, e.g. specifying max_prepared_statement_total_mem=5G
for the server, and maybe max_prepared_statement_mem=1G for client
processes. Of course we would have to implement cross client process
invalidation here, and I don't know if communicating client processes
are even intended.

Anyway, a memory limit won't really add that much more overhead. At
least not more than having no prepared statements at all because of the
fear of server OOMs, or have just a small count of those statements. I
was even think about a prepared statement reaper that checks the
pg_prepared_statements every few minutes to clean things up manually,
but having this in the server would be of great value to me.

If you want, I can be reviewer of your patch.

I'd love to have you as my reviewer.

Regards,
Daniel Migowski

#15

Daniel Migowski

dmigowski@ikoffice.de

over 6 years ago

In reply to: Andres Freund (#13)

Re: Adding column "mem_usage" to view pg_prepared_statements

Am 05.08.2019 um 19:16 schrieb Andres Freund:

On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:

how do you want to generalize it? Are you thinking about a view solely
for the display of the memory usage of different objects?

I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...

I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.

I understand. So it would be something like the output of
MemoryContextStatsInternal, but in table form with some extra columns. I
would have loved this extra information already in
MemoryContextStatsInternal btw., so it might be a good idea to upgrade
it first to find the information and wrap a table function over it
afterwards.

The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used, context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used, context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.

A nice way to learn about the internals of the server and to analyze the
effects of memory reducing enhancements.

Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.

The identifier seems to be more like a category at the moment, because
it does not seem to hold any relevant information about the object in
question. So a more specific name would be nice.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.

I understand your argumentation, but things like Cursors and Portals are
rather short living while prepared statements seem to be the place where
memory really builds up.

One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.

I also see no problem here, and with Konstantin Knizhnik's autoprepare I
wouldn't use this very often anyway, more just for monitoring purposes,
where I don't care if my query is a bit more complex.

While being interesting I still believe monitoring the mem usage of
prepared statements is a bit more important than that of other objects
because of how they change memory consumption of the server without
using any DDL or configuration options and I am not aware of other
objects with the same properties, or are there some? And for the other
volatile objects like tables and indexes and their contents PostgreSQL
already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).

Maybe we can do both? Add a single column to pg_prepared_statements, and
add another table for the output of MemoryContextStatsDetail? This has
the advantage that the single real memory indicator useful for end users
(to the question: How much mem takes my sh*t up?) is in
pg_prepared_statements and some more intrinsic information in a detail
view.

Thinking about the latter I am against such a table, at least in the
form where it gives information like context_total_freechunks, because
it would just be useful for us developers. Why should any end user care
for how many chunks are still open in a MemoryContext, except when he is
working on C-style extensions. Could just be a source of confusion for
them.

Let's think about the goal this should have: The end user should be able
to monitor the memory consumption of things he's in control of or could
affect the system performance. Should such a table automatically
aggregate some information? I think so. I would not add more than two
memory columns to the view, just mem_used and mem_reserved. And even
mem_used is questionable, because in his eyes only the memory he cannot
use for other stuff because of object x is important for him (that was
the reason I just added one column). He would even ask: WHY is there 50%
more memory reserved than used, and how I can optimize it? (Would lead
to more curious PostgreSQL developers maybe, so that's maybe a plus).

Something that also clearly speaks FOR such a table and against my
proposal is, that if someone cares for memory, he would most likely care
for ALL his memory, and in that case monitoring prepared statements
would just be a small subset of stuff to monitor. Ok, I am defeated and
will rewrite my patch if the next proposal finds approval:

I would propose a table pg_mem_usage containing the columns
object_class, name, detail, mem_usage (rename them if it fits the style
of the other tables more). The name would be empty for some objects like
the unnamed prepared statement, the query strings would be in the detail
column. One could add a final "Other" row containing the mem no specific
output line has been accounted for. Also it could contain lines for
Cursors and other stuff I am to novice to think of here.

And last: A reason why still we need a child-parent-relationship in this
table (and distinct this_ and total_ mem functions), is that prepared
statements start up to use much more memory when the Generic Plan is
stored in it after a few uses. As a user I always had the assumption
that prepared a statement would already do all the required work to be
fast, but a statement just becomes blazingly fast when the Generic Plan
is available (and used), and it would be nice to see for which
statements that plan has already been generated to consume his memory. I
believe the reason for this would be the fear of excessive memory usage.

On the other hand: The Generic Plan had been created for the first
invocation of the prepared statement, why not store it immediatly. It is
a named statement for a reason that it is intended to be reused, even
when it is just twice, and since memory seems not to be seen as a scarce
resource in this context why not store that immediately. Would drop the
need for a hierarchy here also.

Any comments?

Regards,
Daniel Migowski

#16

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Daniel Migowski (#15)

Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-08-05 22:46:47 +0200, Daniel Migowski wrote:

Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.

The identifier seems to be more like a category at the moment, because it
does not seem to hold any relevant information about the object in question.
So a more specific name would be nice.

I think you might be thinking of the context's name, not ident? E.g. for
prepared statements the context name is:

source_context = AllocSetContextCreate(CurrentMemoryContext,
"CachedPlanSource",
ALLOCSET_START_SMALL_SIZES);

which is obviously the same for every statement. But then there's

MemoryContextSetIdentifier(source_context, plansource->query_string);

which obviously differs.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.

I understand your argumentation, but things like Cursors and Portals are
rather short living while prepared statements seem to be the place where
memory really builds up.

That's not necessarily true, especially given WITH HOLD cursors. Nor
does one only run out of memory in the context of long-lived objects.

While being interesting I still believe monitoring the mem usage of
prepared statements is a bit more important than that of other objects
because of how they change memory consumption of the server without
using any DDL or configuration options and I am not aware of other
objects with the same properties, or are there some? And for the other
volatile objects like tables and indexes and their contents PostgreSQL
already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).

Maybe we can do both? Add a single column to pg_prepared_statements, and add
another table for the output of MemoryContextStatsDetail? This has the
advantage that the single real memory indicator useful for end users (to the
question: How much mem takes my sh*t up?) is in pg_prepared_statements and
some more intrinsic information in a detail view.

I don't see why we'd want to do both. Just makes pg_prepared_statements
a considerably more expensive. And that's used by some applications /
clients in an automated manner.

Thinking about the latter I am against such a table, at least in the form
where it gives information like context_total_freechunks, because it would
just be useful for us developers.

Developers are also an audience for us. I mean we certainly can use this
information during development. But even for bugreports such information
would be useufl.

Why should any end user care for how many
chunks are still open in a MemoryContext, except when he is working on
C-style extensions. Could just be a source of confusion for them.

Meh. As long as the crucial stuff is first, that's imo enough.

Let's think about the goal this should have: The end user should be able to
monitor the memory consumption of things he's in control of or could affect
the system performance. Should such a table automatically aggregate some
information? I think so. I would not add more than two memory columns to the
view, just mem_used and mem_reserved. And even mem_used is questionable,
because in his eyes only the memory he cannot use for other stuff because of
object x is important for him (that was the reason I just added one column).
He would even ask: WHY is there 50% more memory reserved than used, and how
I can optimize it? (Would lead to more curious PostgreSQL developers maybe,
so that's maybe a plus).

It's important because it influences how memory usage will grow.

On the other hand: The Generic Plan had been created for the first
invocation of the prepared statement, why not store it immediatly. It is a
named statement for a reason that it is intended to be reused, even when it
is just twice, and since memory seems not to be seen as a scarce resource in
this context why not store that immediately. Would drop the need for a
hierarchy here also.

Well, we'll maybe never use it, so ...

Greetings,

Andres Freund

#17

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 6 years ago

In reply to: Daniel Migowski (#14)

Re: Adding column "mem_usage" to view pg_prepared_statements

On 05.08.2019 22:35, Daniel Migowski wrote:

.

I think that including in pg_prepared_statements information about
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view,
but for example it is also need in my autoprepare patch.

I would love to use your work if it's done, and would also love to
work together here. I am quite novice in C thought, I might take my
time to get things right.

Right now I resused your implementation of CachedPlanMemoryUsage function:)
Before I took in account only memory used by plan->context, but not of
plan->query_context and plan->gplan->context (although query_context for
raw parse tree seems to be much smaller).

I wonder if you consider go further and not only report but control
memory used by prepared statements?
For example implement some LRU replacement discipline on top of
prepared statements cache which can
evict rarely used prepared statements to avoid memory overflow.

THIS! Having some kind of safety net here would finally make sure that
my precious processes will not grow endlessly until all mem is eaten
up, even with prep statement count limits.

While working on stuff I noticed there are three things stored in a
CachedPlanSource. The raw query tree (a relatively small thing), the
query list (analyzed-and-rewritten query tree) which takes up the most
memory (at least here, maybe different with your usecases), and (often
after the 6th call) the CachedPlan, which is more optimized that the
query list and often needs less memory (half of the query list here).

The query list seems to take the most time to create here, because I
hit the GEQO engine here, but it could be recreated easily (up to
500ms for some queries). Creating the CachedPlan afterwards takes 60ms
in some usecase. IF we could just invalidate them from time to time,
that would be grate.

Also, invalidating just the queries or the CachedPlan would not
invalidate the whole prepared statement, which would break clients
expectations, but just make them a slower, adding much to the
stability of the system. I would pay that price, because I just don't
use manually named prepared statements anyway and just autogenerate
them as performance sugar without thinking about what really needs to
be prepared anyway. There is an option in the JDBC driver to use
prepared statements automatically after you have used them a few time.

I have noticed that cached plans for implicitly prepared statements in
stored procedures are not shown in pg_prepared_statements view.
It may be not a problem in your case (if you are accessing Postgres
through JDBC and not using prepared statements),
but can cause memory overflow in applications actively using stored
procedures, because unlike explicitly created prepared statements, it is
very difficult
to estimate and control statements implicitly prepared by plpgsql.

I am not sure what will be the best solution in this case. Adding yet
another view for implicitly prepared statements? Or include them in
pg_prepared_statements view?

We have such patch for PgPro-EE but it limits only number of prepared
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it
adds more overhead).

Limiting them by number is already done automatically here and would
really not be of much value, but having a mem limit would be great. We
could have a combined memory limit for your autoprepared statements as
well as the manually prepared ones, so clients can know for sure the
server processes won't eat up more that e.g. 800MB for prepared
statements. And also I would like to have this value spread across all
client processes, e.g. specifying max_prepared_statement_total_mem=5G
for the server, and maybe max_prepared_statement_mem=1G for client
processes. Of course we would have to implement cross client process
invalidation here, and I don't know if communicating client processes
are even intended.

Anyway, a memory limit won't really add that much more overhead. At
least not more than having no prepared statements at all because of
the fear of server OOMs, or have just a small count of those
statements. I was even think about a prepared statement reaper that
checks the pg_prepared_statements every few minutes to clean things up
manually, but having this in the server would be of great value to me.

Right now memory context has no field containing amount of currently
used memory.
This is why context->methods->stats function implementation has to
traverse all blocks to calculate size of memory used by context.
It may be not so fast for large contexts. But I do not expect that
contexts of prepared statements will be very large, although
I have deal with customers which issued queries with query text length
larger than few megabytes. I afraid to estimate size of plan for such
queries.
This is the reason of my concern that calculating memory context size
may have negative effect on performance. But is has to be done only once
when statement is prepared. So may be it is not a problem at all.

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#18

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Andres Freund (#16)

Re: Adding column "mem_usage" to view pg_prepared_statements

Hello Daniel,

This patch no longer applies. Please submit an updated version. Also,
there's voluminous discussion that doesn't seem to have resulted in any
revision of the code. Please fix that too.

Thanks,

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services