Enhancing Memory Context Statistics Reporting
Hi,
PostgreSQL provides following capabilities for reporting memory contexts
statistics.
1. pg_get_backend_memory_contexts(); [1]provides a view of memory context statistics for a local backend, while [2] prints the memory context statistics of any backend or auxiliary process to the PostgreSQL logs. Although [1] offers detailed statistics, it is limited to the local backend, restricting its use to PostgreSQL client backends only. On the other hand, [2] provides the statistics for all backends but logs them in a file, which may not be convenient for quick access.
2. pg_log_backend_memory_contexts(pid); [2]*PostgreSQL: Re: Get memory contexts of an arbitrary backend process <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&reserved=0>*
[1]: provides a view of memory context statistics for a local backend, while [2] prints the memory context statistics of any backend or auxiliary process to the PostgreSQL logs. Although [1] offers detailed statistics, it is limited to the local backend, restricting its use to PostgreSQL client backends only. On the other hand, [2] provides the statistics for all backends but logs them in a file, which may not be convenient for quick access.
while [2]*PostgreSQL: Re: Get memory contexts of an arbitrary backend process <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&reserved=0>* prints the memory context statistics of any backend or auxiliary
process to the PostgreSQL logs. Although [1]provides a view of memory context statistics for a local backend, while [2] prints the memory context statistics of any backend or auxiliary process to the PostgreSQL logs. Although [1] offers detailed statistics, it is limited to the local backend, restricting its use to PostgreSQL client backends only. On the other hand, [2] provides the statistics for all backends but logs them in a file, which may not be convenient for quick access. offers detailed statistics,
it is limited to the local backend, restricting its use to PostgreSQL
client backends only.
On the other hand, [2]*PostgreSQL: Re: Get memory contexts of an arbitrary backend process <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&reserved=0>* provides the statistics for all backends but logs
them in a file,
which may not be convenient for quick access.
I propose enhancing memory context statistics reporting by combining these
capabilities and offering a view of memory statistics for all PostgreSQL
backends
and auxiliary processes.
Attached is a patch that implements this functionality. It introduces a SQL
function
that takes the PID of a backend as an argument, returning a set of records,
each containing statistics for a single memory context. The underlying C
function
sends a signal to the backend and waits for it to publish its memory
context statistics
before returning them to the user. The publishing backend copies these
statistics
during the next CHECK_FOR_INTERRUPTS call.
This approach facilitates on-demand publication of memory statistics
for a specific backend, rather than collecting them at regular intervals.
Since past memory context statistics may no longer be relevant,
there is little value in retaining historical data. Any collected
statistics
can be discarded once read by the client backend.
A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics. This number was chosen arbitrarily,
as it covers all parent contexts at level 1 (i.e., direct children of the
top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.
Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.
The statistics are reported in a breadth-first search order of the memory
context tree,
with parent contexts reported before their children. This provides a
cumulative summary
before diving into the details of each child context's consumption.
The rationale behind the shared memory chunk is to ensure that the
majority of contexts which are the direct children of TopMemoryContext,
fit into memory
This allows a client to request a summary of memory statistics,
which can be served from memory without the overhead of file access,
unless necessary.
A publishing backend signals waiting client backends using a condition
variable when it has finished writing its statistics to memory.
The client backend checks whether the statistics belong to the requested
backend.
If not, it continues waiting on the condition variable, timing out after 2
minutes.
This timeout is an arbitrary choice, and further work is required to
determine
a more practical value.
All backends use the same memory space to publish their statistics.
Before publishing, a backend checks whether the previous statistics have
been
successfully read by a client using a shared flag, "in_use."
This flag is set by the publishing backend and cleared by the client
backend once the data is read. If a backend cannot publish due to shared
memory being occupied, it exits the interrupt processing code,
and the client backend times out with a warning.
Please find below an example query to fetch memory contexts from the backend
with id '106114'. Second argument -'get_summary' is 'false',
indicating a request for statistics of all the contexts.
postgres=#
select * FROM pg_get_remote_backend_memory_contexts('116292', false) LIMIT
2;
-[ RECORD 1 ]-+----------------------
name | TopMemoryContext
ident |
type | AllocSet
path | {0}
total_bytes | 97696
total_nblocks | 5
free_bytes | 15376
free_chunks | 11
used_bytes | 82320
pid | 116292
-[ RECORD 2 ]-+----------------------
name | RowDescriptionContext
ident |
type | AllocSet
path | {0,1}
total_bytes | 8192
total_nblocks | 1
free_bytes | 6912
free_chunks | 0
used_bytes | 1280
pid | 116292
TODO:
1. Determine the behaviour when the statistics don't fit in one file.
*[1]provides a view of memory context statistics for a local backend, while [2] prints the memory context statistics of any backend or auxiliary process to the PostgreSQL logs. Although [1] offers detailed statistics, it is limited to the local backend, restricting its use to PostgreSQL client backends only. On the other hand, [2] provides the statistics for all backends but logs them in a file, which may not be convenient for quick access. **PostgreSQL: Re: Creating a function for exposing memory usage of
backend process
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F0a768ae1-1703-59c7-86cc-7068ff5e318c%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436604911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=cbO2DBP6IsgMPTEVFNh%2FKeq4IoK3MZvTpzKkCQzNPMo%3D&reserved=0>*
[2]: *PostgreSQL: Re: Get memory contexts of an arbitrary backend process <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&reserved=0>*
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&reserved=0>*
Thank you,
Rahila Syed
Attachments:
0001-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=0001-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From 28a0f55544fe9fc96d53fb04519770b3f318267c Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH 1/1] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a fixed shared memory area, as part
of next CHECK_FOR_INTERRUPTS().
If there are more statistics to be shared, it creates
a file and copies remaining stats to that file.
Once its done, it signals the
client backend using a condition variable. The client backend
which was waiting on that condition variable, then wakes up,
reads the shared memory and returns these values in the form of
set of records, one for each memory context, to the user.
The client backend tries to read the remaining
statistics from the file if it exists. The client backend
is reponsible for deleting the file when it finishes
reading and also marking the shared memory area as not in
use, in order to allow the other backends to fill it with
new statistics.
---
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 274 ++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 373 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 60 +++
17 files changed, 741 insertions(+), 12 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 9087e3f8db..4551ff2183 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ee6f1afc9a..b4b56142cd 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 10fc18f252..ff4a607fb3 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -53,6 +53,7 @@
#include "storage/spin.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -345,6 +346,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
WaitLSNShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7f5eada9d4..eb0316442c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3501,6 +3501,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8efb4044d6..95b1e36303 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..4c5da91538 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,23 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+#include "common/file_utils.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -305,3 +300,260 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_remote_backend_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a shared memory space. The statistics that do not fit in shared
+ * memory area are copied to a file by the backend.
+ *
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory and if needed to a temp file.
+ * Once condition variable comes out of sleep check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_remote_backend_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContextParams *mem_stat = NULL;
+ char tmpfilename[MAXPGPATH];
+ FILE *fp = NULL;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /* Only request the statistics that fit in memory, if get_summary is true. */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->get_summary = get_summary;
+ LWLockRelease(&memCtxState->lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated when in_use is set true
+ * by the backend
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep only when atleast one backend has
+ * published some memcontext information
+ *
+ * Make sure that all the stats has been published and the information
+ * belongs to pid we requested information for, Otherwise loop back
+ * and wait for the correct backend to publish the information
+ */
+ if (memCtxState->in_use == true && memCtxState->proc_id == pid)
+ break;
+ else
+ LWLockRelease(&memCtxState->lw_lock);
+
+ if (ConditionVariableTimedSleep(&memCtxState->memctx_cv, 120000,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(WARNING,
+ (errmsg("Wait for %d process to publish stats timed out, try again", pid)));
+ return (Datum) 0;
+ }
+ }
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState->in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[10];
+ bool nulls[10];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memCtxState->memctx_infos[i].name) != 0)
+ values[0] = CStringGetTextDatum(memCtxState->memctx_infos[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memCtxState->memctx_infos[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memCtxState->memctx_infos[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memCtxState->memctx_infos[i].type);
+ path_length = memCtxState->memctx_infos[i].path_length;
+ path_array = construct_array_builtin(memCtxState->memctx_infos[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memCtxState->memctx_infos[i].totalspace);
+ values[5] = Int64GetDatum(memCtxState->memctx_infos[i].nblocks);
+ values[6] = Int64GetDatum(memCtxState->memctx_infos[i].freespace);
+ values[7] = Int64GetDatum(memCtxState->memctx_infos[i].freechunks);
+ values[8] = Int64GetDatum(memCtxState->memctx_infos[i].totalspace - memCtxState->memctx_infos[i].freespace);
+ values[9] = Int32GetDatum(memCtxState->proc_id);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* No more stats to read return */
+ if (memCtxState->total_stats == i)
+ {
+ /*
+ * Clear the in_use flag after we have finished reading, the stats, so
+ * another backend can use the shared space
+ */
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));
+ LWLockRelease(&memCtxState->lw_lock);
+ return (Datum) 0;
+ }
+ /* Compute name for temp mem stat file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ memCtxState->proc_id);
+ LWLockRelease(&memCtxState->lw_lock);
+ ConditionVariableCancelSleep();
+
+ /* Open file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_R);
+ if (!fp)
+ {
+ ereport(WARNING,
+ (errcode_for_file_access(),
+ errmsg("could not read from the file")));
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));
+ LWLockRelease(&memCtxState->lw_lock);
+ return (Datum) 0;
+ }
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+ while (!feof(fp))
+ {
+ int path_length;
+ ArrayType *path_array;
+ Datum values[10];
+ bool nulls[10];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ /* Read stats from file */
+ if (fread(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ if (ferror(fp))
+ {
+ elog(WARNING, "File read error");
+ break;
+ }
+ /* EOF reached */
+ break;
+ }
+ path_length = mem_stat->path_length;
+ if (strlen(mem_stat->name) != 0)
+ values[0] = CStringGetTextDatum(mem_stat->name);
+ else
+ nulls[0] = true;
+
+ if (strlen(mem_stat->ident) != 0)
+ values[1] = CStringGetTextDatum(mem_stat->ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(mem_stat->type);
+
+ path_array = construct_array_builtin(mem_stat->path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(mem_stat->totalspace);
+ values[5] = Int64GetDatum(mem_stat->nblocks);
+ values[6] = Int64GetDatum(mem_stat->freespace);
+ values[7] = Int64GetDatum(mem_stat->freechunks);
+ values[8] = Int64GetDatum(mem_stat->totalspace - mem_stat->freespace);
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ values[9] = Int32GetDatum(memCtxState->proc_id);
+ LWLockRelease(&memCtxState->lw_lock);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ /*
+ * Clear the in_use flag after we have finished reading, the stats, so
+ * another backend can use the shared space, also reset the contents.
+ */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));
+ LWLockRelease(&memCtxState->lw_lock);
+ pfree(mem_stat);
+ FreeFile(fp);
+ /* Delete the temp file that stores memory stats */
+ unlink(tmpfilename);
+
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+
+ size = offsetof(MemoryContextState, memctx_infos);
+ size = add_size(size, mul_size(30, sizeof(MemoryContextInfo)));
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ ConditionVariableInit(&memCtxState->memctx_cv);
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));
+ LWLockInitialize(&memCtxState->lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState->lw_lock.tranche, "mem_context_stats_reporting");
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index bde54326c6..f915d8130c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -21,18 +21,23 @@
#include "postgres.h"
+#include "common/file_utils.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
static Size BogusGetChunkSpace(void *pointer);
+static int PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident);
/*****************************************************************************
* GLOBAL MEMORY *
@@ -166,6 +171,7 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContext context, int64 counter, List *path, char *clipped_ident);
/*
* You should not do memory allocations within a critical section, because
@@ -1276,6 +1282,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1334,356 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via fixed shared memory which
+ * can hold statistics for 29 contexts. The rest of the
+ * statistics are stored in a file. This file is created
+ * in PG_TEMP_FILES_DIR and deleted by the client after
+ * reading the stats.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+ FILE *fp = NULL;
+ char tmpfilename[MAXPGPATH];
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ bool get_summary = false;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Shared memory is not available to be written, return. The waiting
+ * client backend will timeout with a warning.
+ */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ if (memCtxState->in_use)
+ {
+ LWLockRelease(&memCtxState->lw_lock);
+ return;
+ }
+ LWLockRelease(&memCtxState->lw_lock);
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_remote_backend_memory_contextis view, similar to its local
+ * backend couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * As in OpenTemporaryFileInTablespace, try to make the temp-file
+ * directory, ignoring errors.
+ */
+ (void) MakePGDirectory(PG_TEMP_FILES_DIR);
+
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->proc_id = MyProcPid;
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= 28)
+ {
+ /* Copy statistics to shared memory */
+ PublishMemoryContext(cur, context_id, path, (cur->ident != NULL ? clipped_ident : NULL));
+ }
+ else
+ {
+ if (PublishMemoryContextToFile(cur, fp, path, (cur->ident != NULL ? clipped_ident : NULL)) == -1)
+ break;
+ }
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+
+ /*
+ * Shared memory is full, release lock and write to file from next
+ * iteration
+ */
+ context_id++;
+ if (context_id == 29)
+ {
+ memCtxState->in_memory_stats = context_id;
+ get_summary = memCtxState->get_summary;
+ LWLockRelease(&memCtxState->lw_lock);
+ /* Construct name for temp file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ MyProcPid);
+ /* Open file to copy rest of the stats in the file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_A);
+
+ /*
+ * Only in-memory stats(summary) are requested, so do not write to
+ * file
+ */
+ if (fp == NULL || get_summary)
+ break;
+ }
+ }
+ if (context_id < 29)
+ {
+ memCtxState->in_memory_stats = context_id;
+ LWLockRelease(&memCtxState->lw_lock);
+ }
+
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->in_use = true;
+ memCtxState->total_stats = context_id;
+ LWLockRelease(&memCtxState->lw_lock);
+ ConditionVariableBroadcast(&memCtxState->memctx_cv);
+
+ /* Release file */
+ if (fp && FreeFile(fp))
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not free file \"%s\": %m", tmpfilename)));
+ }
+}
+
+static void
+PublishMemoryContext(MemoryContext context, int64 counter, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memCtxState->memctx_infos[counter].name, context->name, strlen(context->name));
+ }
+ else
+ memCtxState->memctx_infos[counter].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memCtxState->memctx_infos[counter].name, clipped_ident, strlen(clipped_ident));
+ memCtxState->memctx_infos[counter].ident[0] = '\0';
+ }
+ else
+ strncpy(memCtxState->memctx_infos[counter].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memCtxState->memctx_infos[counter].ident[0] = '\0';
+
+ memCtxState->memctx_infos[counter].path_length = list_length(path);
+ foreach_int(i, path)
+ memCtxState->memctx_infos[counter].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ }
+ memCtxState->memctx_infos[counter].totalspace = stat.totalspace;
+ memCtxState->memctx_infos[counter].nblocks = stat.nblocks;
+ memCtxState->memctx_infos[counter].freespace = stat.freespace;
+ memCtxState->memctx_infos[counter].freechunks = stat.freechunks;
+}
+
+static int
+PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ MemoryContextParams *mem_stat;
+ char *type;
+
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+
+ /*
+ * Assuming the context name will not exceed context identifier display
+ * size XXX Reduce the limit for name length to correctly reflect
+ * practical examples XXX Add handling similar to clipped_ident of name
+ * exceeds the size limit
+ */
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(mem_stat->name, context->name, strlen(context->name));
+ }
+ else
+ mem_stat->name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(mem_stat->name, clipped_ident, strlen(clipped_ident));
+ mem_stat->ident[0] = '\0';
+ }
+ else
+ strncpy(mem_stat->ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ mem_stat->ident[0] = '\0';
+
+ mem_stat->path_length = list_length(path);
+ foreach_int(i, path)
+ mem_stat->path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context itself */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ }
+ mem_stat->totalspace = stat.totalspace;
+ mem_stat->nblocks = stat.nblocks;
+ mem_stat->freespace = stat.freespace;
+ mem_stat->freechunks = stat.freechunks;
+
+ if (!fp)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not create file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ if (fwrite(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not write to file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ pfree(mem_stat);
+
+ return 0;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c0b74fe05..5d7d0bcbf5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8429,6 +8429,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_remote_backend_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int4,int4,int4,int4,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid}',
+ prosrc => 'pg_get_remote_backend_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index e26d108a47..da07f99d7d 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596ff21..6d05465253 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,8 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
/*
@@ -48,6 +50,8 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+
/*
* Standard top-level memory contexts.
@@ -115,6 +119,62 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[128];
+ char type[128];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ bool in_use;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ MemoryContextInfo memctx_infos[30];
+} MemoryContextState;
+
+/* Backend local struct used to write statistics to a file */
+typedef struct MemoryContextParams
+{
+ char name[1024];
+ char ident[1024];
+ char type[128];
+ Datum path[128];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextParams;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
--
2.34.1
On Mon, Oct 21, 2024 at 11:54:21PM +0530, Rahila Syed wrote:
On the other hand, [2] provides the statistics for all backends but logs
them in a file, which may not be convenient for quick access.
To be precise, pg_log_backend_memory_contexts() pushes the memory
context stats to LOG_SERVER_ONLY or stderr, hence this is appended to
the server logs.
A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics. This number was chosen arbitrarily,
as it covers all parent contexts at level 1 (i.e., direct children of the
top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.
+ * Statistics are shared via fixed shared memory which
+ * can hold statistics for 29 contexts. The rest of the
[...]
+ MemoryContextInfo memctx_infos[30];
[...]
+ memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));
[...]
+ size = add_size(size, mul_size(30, sizeof(MemoryContextInfo)));
[...]
+ memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));
[...]
+ memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));
This number is tied to MemoryContextState added by the patch. Sounds
like this would be better as a constant properly defined rather than
hardcoded in all these places. This would make the upper-bound more
easily switchable in the patch.
+ Datum path[128];
+ char type[128];
[...]
+ char name[1024];
+ char ident[1024];
+ char type[128];
+ Datum path[128];
Again, constants. Why these values? You may want to use more
#defines here.
Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.
Is the addition of the file to write any remaining stats really that
useful? This comes with a heavy cost in the patch with the "in_use"
flag, the various tweaks around the LWLock release/acquire protecting
the shmem area and the extra cleanup steps required after even a clean
restart. That's a lot of facility for this kind of information.
Another thing that may be worth considering is to put this information
in a DSM per the variable-size nature of the information, perhaps cap
it to a max to make the memory footprint cheaper, and avoid all
on-disk footprint because we don't need it to begin with as this is
information that makes sense only while the server is running.
Also, why the single-backend limitation? One could imagine a shared
memory area indexed similarly to pgproc entries, that includes
auxiliary processes as much as backends, so as it can be possible to
get more memory footprints through SQL for more than one single
process at one moment in time. If each backend has its own area of
shmem to deal with, they could use a shared LWLock on the shmem area
with an extra spinlock while the context data is dumped into memory as
the copy is short-lived. Each one of them could save the information
in a DSM created only when a dump of the shmem is requested for a
given PID, for example.
--
Michael
On 2024-10-22 03:24, Rahila Syed wrote:
Hi,
PostgreSQL provides following capabilities for reporting memory
contexts statistics.
1. pg_get_backend_memory_contexts(); [1]
2. pg_log_backend_memory_contexts(pid); [2][1] provides a view of memory context statistics for a local backend,
while [2] prints the memory context statistics of any backend or
auxiliary
process to the PostgreSQL logs. Although [1] offers detailed
statistics,
it is limited to the local backend, restricting its use to PostgreSQL
client backends only.
On the other hand, [2] provides the statistics for all backends but
logs them in a file,
which may not be convenient for quick access.I propose enhancing memory context statistics reporting by combining
these
capabilities and offering a view of memory statistics for all
PostgreSQL backends
and auxiliary processes.
Thanks for working on this!
I originally tried to develop something like your proposal in [2], but
there were some difficulties and settled down to implement
pg_log_backend_memory_contexts().
Attached is a patch that implements this functionality. It introduces
a SQL function
that takes the PID of a backend as an argument, returning a set of
records,
each containing statistics for a single memory context. The underlying
C function
sends a signal to the backend and waits for it to publish its memory
context statistics
before returning them to the user. The publishing backend copies
these statistics
during the next CHECK_FOR_INTERRUPTS call.
I remember waiting for dumping memory contexts stats could cause trouble
considering some erroneous cases.
For example, just after the target process finished dumping stats,
pg_get_remote_backend_memory_contexts() caller is terminated before
reading the stats, calling pg_get_remote_backend_memory_contexts() has
no response any more:
[session1]$ psql
(40699)=#
$ kill -s SIGSTOP 40699
[session2] psql
(40866)=# select * FROM
pg_get_remote_backend_memory_contexts('40699', false); -- waiting
$ kill -s SIGSTOP 40866
$ kill -s SIGCONT 40699
[session3] psql
(47656) $ select pg_terminate_backend(40866);
$ kill -s SIGCONT 40866 -- session2 terminated
[session3] (47656)=# select * FROM
pg_get_remote_backend_memory_contexts('47656', false); -- no response
It seems the reason is memCtxState->in_use is now and
memCtxState->proc_id is 40699.
We can continue to use pg_get_remote_backend_memory_contexts() after
specifying 40699, but it'd be hard to understand for users.
This approach facilitates on-demand publication of memory statistics
for a specific backend, rather than collecting them at regular
intervals.
Since past memory context statistics may no longer be relevant,
there is little value in retaining historical data. Any collected
statistics
can be discarded once read by the client backend.A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics. This number was chosen arbitrarily,
as it covers all parent contexts at level 1 (i.e., direct children of
the top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.The statistics are reported in a breadth-first search order of the
memory context tree,
with parent contexts reported before their children. This provides a
cumulative summary
before diving into the details of each child context's consumption.The rationale behind the shared memory chunk is to ensure that the
majority of contexts which are the direct children of
TopMemoryContext,
fit into memory
This allows a client to request a summary of memory statistics,
which can be served from memory without the overhead of file access,
unless necessary.A publishing backend signals waiting client backends using a condition
variable when it has finished writing its statistics to memory.
The client backend checks whether the statistics belong to the
requested backend.
If not, it continues waiting on the condition variable, timing out
after 2 minutes.
This timeout is an arbitrary choice, and further work is required to
determine
a more practical value.All backends use the same memory space to publish their statistics.
Before publishing, a backend checks whether the previous statistics
have been
successfully read by a client using a shared flag, "in_use."
This flag is set by the publishing backend and cleared by the client
backend once the data is read. If a backend cannot publish due to
shared
memory being occupied, it exits the interrupt processing code,
and the client backend times out with a warning.Please find below an example query to fetch memory contexts from the
backend
with id '106114'. Second argument -'get_summary' is 'false',
indicating a request for statistics of all the contexts.postgres=#
select * FROM pg_get_remote_backend_memory_contexts('116292', false)
LIMIT 2;
-[ RECORD 1 ]-+----------------------
name | TopMemoryContext
ident |
type | AllocSet
path | {0}
total_bytes | 97696
total_nblocks | 5
free_bytes | 15376
free_chunks | 11
used_bytes | 82320
pid | 116292
-[ RECORD 2 ]-+----------------------
name | RowDescriptionContext
ident |
type | AllocSet
path | {0,1}
total_bytes | 8192
total_nblocks | 1
free_bytes | 6912
free_chunks | 0
used_bytes | 1280
pid | 116292
32d3ed8165f821f introduced 1-based path to pg_backend_memory_contexts,
but pg_get_remote_backend_memory_contexts() seems to have 0-base path.
pg_backend_memory_contexts has "level" column, but
pg_get_remote_backend_memory_contexts doesn't.
Are there any reasons for these?
TODO:
1. Determine the behaviour when the statistics don't fit in one file.[1] PostgreSQL: Re: Creating a function for exposing memory usage of
backend process [1][2] PostgreSQL: Re: Get memory contexts of an arbitrary backend
process [2]Thank you,
Rahila SyedLinks:
------
[1]
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F0a768ae1-1703-59c7-86cc-7068ff5e318c%2540oss.nttdata.com&amp;data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436604911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&amp;sdata=cbO2DBP6IsgMPTEVFNh%2FKeq4IoK3MZvTpzKkCQzNPMo%3D&amp;reserved=0
[2]
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fbea016ad-d1a7-f01d-a7e8-01106a1de77f%2540oss.nttdata.com&amp;data=05%7C02%7Csyedrahila%40microsoft.com%7C3b35e97c29cf4796042408dcee8a4dbb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638647525436629740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&amp;sdata=UCwkwg6kikVEf0oHf3%2BlliA%2FTUdMG%2F0cOiMta7fjPPk%3D&amp;reserved=0
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Hi Michael,
Thank you for the review.
On Tue, Oct 22, 2024 at 12:18 PM Michael Paquier <michael@paquier.xyz>
wrote:
On Mon, Oct 21, 2024 at 11:54:21PM +0530, Rahila Syed wrote:
On the other hand, [2] provides the statistics for all backends but logs
them in a file, which may not be convenient for quick access.To be precise, pg_log_backend_memory_contexts() pushes the memory
context stats to LOG_SERVER_ONLY or stderr, hence this is appended to
the server logs.A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics. This number was chosen arbitrarily,
as it covers all parent contexts at level 1 (i.e., direct children ofthe
top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.+ * Statistics are shared via fixed shared memory which + * can hold statistics for 29 contexts. The rest of the [...] + MemoryContextInfo memctx_infos[30]; [...] + memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo)); [...] + size = add_size(size, mul_size(30, sizeof(MemoryContextInfo))); [...] + memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo)); [...] + memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));This number is tied to MemoryContextState added by the patch. Sounds
like this would be better as a constant properly defined rather than
hardcoded in all these places. This would make the upper-bound more
easily switchable in the patch.
Makes sense. Fixed in the attached patch.
+ Datum path[128]; + char type[128]; [...] + char name[1024]; + char ident[1024]; + char type[128]; + Datum path[128];Again, constants. Why these values? You may want to use more
#defines here.I added the #defines for these in the attached patch.
Size of the path array should match the number of levels in the memory
context tree and type is a constant string.
For the name and ident, I have used the existing #define
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE as the size limit.
Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.Is the addition of the file to write any remaining stats really that
useful? This comes with a heavy cost in the patch with the "in_use"
flag, the various tweaks around the LWLock release/acquire protecting
the shmem area and the extra cleanup steps required after even a clean
restart. That's a lot of facility for this kind of information.
The rationale behind using the file is to cater to the unbounded
number of memory contexts.
The "in_use" flag is used to govern the access to shared memory
as I am reserving enough memory for only one backend.
It ensures that another backend does not overwrite the statistics
in the shared memory, before it is read by a client backend.
Another thing that may be worth considering is to put this information
in a DSM per the variable-size nature of the information, perhaps cap
it to a max to make the memory footprint cheaper, and avoid all
on-disk footprint because we don't need it to begin with as this is
information that makes sense only while the server is running.Thank you for the suggestion. I will look into using DSMs especially
if there is a way to limit the statistics dump, while still providing a
user
with enough information to debug memory consumption.
In this draft, I preferred using a file over DSMs, as a file can provide
ample space for dumping a large number of memory context statistics
without the risk of DSM creation failure due to insufficient memory.
Also, why the single-backend limitation?
To reduce the memory footprint, the shared memory is
created for only one backend.
Each backend has to wait for previous operation
to finish before it can write.
I think a good use case for this would be a background process
periodically running the monitoring function on each of the
backends sequentially to fetch the statistics.
This way there will be little contention for shared memory.
In case a shared memory is not available, a backend immediately
returns from the interrupt handler without blocking its normal
operations.
One could imagine a shared
memory area indexed similarly to pgproc entries, that includes
auxiliary processes as much as backends, so as it can be possible to
get more memory footprints through SQL for more than one single
process at one moment in time. If each backend has its own area of
shmem to deal with, they could use a shared LWLock on the shmem area
with an extra spinlock while the context data is dumped into memory as
the copy is short-lived. Each one of them could save the information
in a DSM created only when a dump of the shmem is requested for a
given PID, for example.
I agree that such an infrastructure would be useful for fetching memory
statistics concurrently without significant synchronization overhead.
However, a drawback of this approach is reserving shared
memory slots up to MAX_BACKENDS without utilizing them
when no concurrent monitoring is happening.
As you mentioned, creating a DSM on the fly when a dump
request is received could help avoid over-allocating shared memory.
I will look into this suggestion
Thank you for your feedback!
Rahila Syed
Attachments:
0002-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=0002-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From aa855cadaa1b3ce222ba0d806585016d48c93d53 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH 1/1] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a fixed shared memory area, as part
of next CHECK_FOR_INTERRUPTS().
If there are more statistics to be shared, it creates
a file and copies remaining stats to that file.
Once its done, it signals the
client backend using a condition variable. The client backend
which was waiting on that condition variable, then wakes up,
reads the shared memory and returns these values in the form of
set of records, one for each memory context, to the user.
The client backend tries to read the remaining
statistics from the file if it exists. The client backend
is reponsible for deleting the file when it finishes
reading and also marking the shared memory area as not in
use, in order to allow the other backends to fill it with
new statistics.
---
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 274 ++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 373 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 62 +++
17 files changed, 743 insertions(+), 12 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 9087e3f8db..4551ff2183 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ee6f1afc9a..b4b56142cd 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 10fc18f252..ff4a607fb3 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -53,6 +53,7 @@
#include "storage/spin.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -345,6 +346,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
WaitLSNShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7f5eada9d4..eb0316442c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3501,6 +3501,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8efb4044d6..95b1e36303 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..040097e613 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,23 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+#include "common/file_utils.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -305,3 +300,260 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_remote_backend_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a shared memory space. The statistics that do not fit in shared
+ * memory area are copied to a file by the backend.
+ *
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory and if needed to a temp file.
+ * Once condition variable comes out of sleep check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_remote_backend_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContextParams *mem_stat = NULL;
+ char tmpfilename[MAXPGPATH];
+ FILE *fp = NULL;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /* Only request the statistics that fit in memory, if get_summary is true. */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->get_summary = get_summary;
+ LWLockRelease(&memCtxState->lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated when in_use is set true
+ * by the backend
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep only when atleast one backend has
+ * published some memcontext information
+ *
+ * Make sure that all the stats has been published and the information
+ * belongs to pid we requested information for, Otherwise loop back
+ * and wait for the correct backend to publish the information
+ */
+ if (memCtxState->in_use == true && memCtxState->proc_id == pid)
+ break;
+ else
+ LWLockRelease(&memCtxState->lw_lock);
+
+ if (ConditionVariableTimedSleep(&memCtxState->memctx_cv, 120000,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(WARNING,
+ (errmsg("Wait for %d process to publish stats timed out, try again", pid)));
+ return (Datum) 0;
+ }
+ }
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState->in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[10];
+ bool nulls[10];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memCtxState->memctx_infos[i].name) != 0)
+ values[0] = CStringGetTextDatum(memCtxState->memctx_infos[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memCtxState->memctx_infos[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memCtxState->memctx_infos[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memCtxState->memctx_infos[i].type);
+ path_length = memCtxState->memctx_infos[i].path_length;
+ path_array = construct_array_builtin(memCtxState->memctx_infos[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memCtxState->memctx_infos[i].totalspace);
+ values[5] = Int64GetDatum(memCtxState->memctx_infos[i].nblocks);
+ values[6] = Int64GetDatum(memCtxState->memctx_infos[i].freespace);
+ values[7] = Int64GetDatum(memCtxState->memctx_infos[i].freechunks);
+ values[8] = Int64GetDatum(memCtxState->memctx_infos[i].totalspace - memCtxState->memctx_infos[i].freespace);
+ values[9] = Int32GetDatum(memCtxState->proc_id);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* No more stats to read return */
+ if (memCtxState->total_stats == i)
+ {
+ /*
+ * Clear the in_use flag after we have finished reading, the stats, so
+ * another backend can use the shared space
+ */
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, MEM_CONTEXT_SHMEM_STATS_SIZE * sizeof(MemoryContextInfo));
+ LWLockRelease(&memCtxState->lw_lock);
+ return (Datum) 0;
+ }
+ /* Compute name for temp mem stat file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ memCtxState->proc_id);
+ LWLockRelease(&memCtxState->lw_lock);
+ ConditionVariableCancelSleep();
+
+ /* Open file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_R);
+ if (!fp)
+ {
+ ereport(WARNING,
+ (errcode_for_file_access(),
+ errmsg("could not read from the file")));
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, MEM_CONTEXT_SHMEM_STATS_SIZE * sizeof(MemoryContextInfo));
+ LWLockRelease(&memCtxState->lw_lock);
+ return (Datum) 0;
+ }
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+ while (!feof(fp))
+ {
+ int path_length;
+ ArrayType *path_array;
+ Datum values[10];
+ bool nulls[10];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ /* Read stats from file */
+ if (fread(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ if (ferror(fp))
+ {
+ elog(WARNING, "File read error");
+ break;
+ }
+ /* EOF reached */
+ break;
+ }
+ path_length = mem_stat->path_length;
+ if (strlen(mem_stat->name) != 0)
+ values[0] = CStringGetTextDatum(mem_stat->name);
+ else
+ nulls[0] = true;
+
+ if (strlen(mem_stat->ident) != 0)
+ values[1] = CStringGetTextDatum(mem_stat->ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(mem_stat->type);
+
+ path_array = construct_array_builtin(mem_stat->path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(mem_stat->totalspace);
+ values[5] = Int64GetDatum(mem_stat->nblocks);
+ values[6] = Int64GetDatum(mem_stat->freespace);
+ values[7] = Int64GetDatum(mem_stat->freechunks);
+ values[8] = Int64GetDatum(mem_stat->totalspace - mem_stat->freespace);
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ values[9] = Int32GetDatum(memCtxState->proc_id);
+ LWLockRelease(&memCtxState->lw_lock);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ /*
+ * Clear the in_use flag after we have finished reading, the stats, so
+ * another backend can use the shared space, also reset the contents.
+ */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, MEM_CONTEXT_SHMEM_STATS_SIZE * sizeof(MemoryContextInfo));
+ LWLockRelease(&memCtxState->lw_lock);
+ pfree(mem_stat);
+ FreeFile(fp);
+ /* Delete the temp file that stores memory stats */
+ unlink(tmpfilename);
+
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+
+ size = offsetof(MemoryContextState, memctx_infos);
+ size = add_size(size, mul_size(MEM_CONTEXT_SHMEM_STATS_SIZE, sizeof(MemoryContextInfo)));
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ ConditionVariableInit(&memCtxState->memctx_cv);
+ memCtxState->in_use = false;
+ memset(&memCtxState->memctx_infos, 0, MEM_CONTEXT_SHMEM_STATS_SIZE * sizeof(MemoryContextInfo));
+ LWLockInitialize(&memCtxState->lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState->lw_lock.tranche, "mem_context_stats_reporting");
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index bde54326c6..f915d8130c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -21,18 +21,23 @@
#include "postgres.h"
+#include "common/file_utils.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
static Size BogusGetChunkSpace(void *pointer);
+static int PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident);
/*****************************************************************************
* GLOBAL MEMORY *
@@ -166,6 +171,7 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContext context, int64 counter, List *path, char *clipped_ident);
/*
* You should not do memory allocations within a critical section, because
@@ -1276,6 +1282,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1334,356 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via fixed shared memory which
+ * can hold statistics for 29 contexts. The rest of the
+ * statistics are stored in a file. This file is created
+ * in PG_TEMP_FILES_DIR and deleted by the client after
+ * reading the stats.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+ FILE *fp = NULL;
+ char tmpfilename[MAXPGPATH];
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ bool get_summary = false;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Shared memory is not available to be written, return. The waiting
+ * client backend will timeout with a warning.
+ */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ if (memCtxState->in_use)
+ {
+ LWLockRelease(&memCtxState->lw_lock);
+ return;
+ }
+ LWLockRelease(&memCtxState->lw_lock);
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_remote_backend_memory_contextis view, similar to its local
+ * backend couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * As in OpenTemporaryFileInTablespace, try to make the temp-file
+ * directory, ignoring errors.
+ */
+ (void) MakePGDirectory(PG_TEMP_FILES_DIR);
+
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->proc_id = MyProcPid;
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= 28)
+ {
+ /* Copy statistics to shared memory */
+ PublishMemoryContext(cur, context_id, path, (cur->ident != NULL ? clipped_ident : NULL));
+ }
+ else
+ {
+ if (PublishMemoryContextToFile(cur, fp, path, (cur->ident != NULL ? clipped_ident : NULL)) == -1)
+ break;
+ }
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+
+ /*
+ * Shared memory is full, release lock and write to file from next
+ * iteration
+ */
+ context_id++;
+ if (context_id == 29)
+ {
+ memCtxState->in_memory_stats = context_id;
+ get_summary = memCtxState->get_summary;
+ LWLockRelease(&memCtxState->lw_lock);
+ /* Construct name for temp file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ MyProcPid);
+ /* Open file to copy rest of the stats in the file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_A);
+
+ /*
+ * Only in-memory stats(summary) are requested, so do not write to
+ * file
+ */
+ if (fp == NULL || get_summary)
+ break;
+ }
+ }
+ if (context_id < 29)
+ {
+ memCtxState->in_memory_stats = context_id;
+ LWLockRelease(&memCtxState->lw_lock);
+ }
+
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ memCtxState->in_use = true;
+ memCtxState->total_stats = context_id;
+ LWLockRelease(&memCtxState->lw_lock);
+ ConditionVariableBroadcast(&memCtxState->memctx_cv);
+
+ /* Release file */
+ if (fp && FreeFile(fp))
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not free file \"%s\": %m", tmpfilename)));
+ }
+}
+
+static void
+PublishMemoryContext(MemoryContext context, int64 counter, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memCtxState->memctx_infos[counter].name, context->name, strlen(context->name));
+ }
+ else
+ memCtxState->memctx_infos[counter].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memCtxState->memctx_infos[counter].name, clipped_ident, strlen(clipped_ident));
+ memCtxState->memctx_infos[counter].ident[0] = '\0';
+ }
+ else
+ strncpy(memCtxState->memctx_infos[counter].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memCtxState->memctx_infos[counter].ident[0] = '\0';
+
+ memCtxState->memctx_infos[counter].path_length = list_length(path);
+ foreach_int(i, path)
+ memCtxState->memctx_infos[counter].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memCtxState->memctx_infos[counter].type, type, strlen(type));
+ break;
+ }
+ memCtxState->memctx_infos[counter].totalspace = stat.totalspace;
+ memCtxState->memctx_infos[counter].nblocks = stat.nblocks;
+ memCtxState->memctx_infos[counter].freespace = stat.freespace;
+ memCtxState->memctx_infos[counter].freechunks = stat.freechunks;
+}
+
+static int
+PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ MemoryContextParams *mem_stat;
+ char *type;
+
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+
+ /*
+ * Assuming the context name will not exceed context identifier display
+ * size XXX Reduce the limit for name length to correctly reflect
+ * practical examples XXX Add handling similar to clipped_ident of name
+ * exceeds the size limit
+ */
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(mem_stat->name, context->name, strlen(context->name));
+ }
+ else
+ mem_stat->name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(mem_stat->name, clipped_ident, strlen(clipped_ident));
+ mem_stat->ident[0] = '\0';
+ }
+ else
+ strncpy(mem_stat->ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ mem_stat->ident[0] = '\0';
+
+ mem_stat->path_length = list_length(path);
+ foreach_int(i, path)
+ mem_stat->path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context itself */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ }
+ mem_stat->totalspace = stat.totalspace;
+ mem_stat->nblocks = stat.nblocks;
+ mem_stat->freespace = stat.freespace;
+ mem_stat->freechunks = stat.freechunks;
+
+ if (!fp)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not create file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ if (fwrite(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not write to file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ pfree(mem_stat);
+
+ return 0;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c0b74fe05..5d7d0bcbf5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8429,6 +8429,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_remote_backend_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int4,int4,int4,int4,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid}',
+ prosrc => 'pg_get_remote_backend_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index e26d108a47..da07f99d7d 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596ff21..e7efbd0e07 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,8 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
/*
@@ -48,7 +50,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +121,62 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ bool in_use;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ MemoryContextInfo memctx_infos[MEM_CONTEXT_SHMEM_STATS_SIZE];
+} MemoryContextState;
+
+/* Backend local struct used to write statistics to a file */
+typedef struct MemoryContextParams
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char type[MAX_TYPE_STRING_LENGTH];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextParams;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
--
2.34.1
Hi Torikoshia,
Thank you for reviewing the patch!
On Wed, Oct 23, 2024 at 9:28 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:
On 2024-10-22 03:24, Rahila Syed wrote:
Hi,
PostgreSQL provides following capabilities for reporting memory
contexts statistics.
1. pg_get_backend_memory_contexts(); [1]
2. pg_log_backend_memory_contexts(pid); [2][1] provides a view of memory context statistics for a local backend,
while [2] prints the memory context statistics of any backend or
auxiliary
process to the PostgreSQL logs. Although [1] offers detailed
statistics,
it is limited to the local backend, restricting its use to PostgreSQL
client backends only.
On the other hand, [2] provides the statistics for all backends but
logs them in a file,
which may not be convenient for quick access.I propose enhancing memory context statistics reporting by combining
these
capabilities and offering a view of memory statistics for all
PostgreSQL backends
and auxiliary processes.Thanks for working on this!
I originally tried to develop something like your proposal in [2], but
there were some difficulties and settled down to implement
pg_log_backend_memory_contexts().Yes. I am revisiting this problem :)
Attached is a patch that implements this functionality. It introduces
a SQL function
that takes the PID of a backend as an argument, returning a set of
records,
each containing statistics for a single memory context. The underlying
C function
sends a signal to the backend and waits for it to publish its memory
context statistics
before returning them to the user. The publishing backend copies
these statistics
during the next CHECK_FOR_INTERRUPTS call.I remember waiting for dumping memory contexts stats could cause trouble
considering some erroneous cases.For example, just after the target process finished dumping stats,
pg_get_remote_backend_memory_contexts() caller is terminated before
reading the stats, calling pg_get_remote_backend_memory_contexts() has
no response any more:[session1]$ psql
(40699)=#$ kill -s SIGSTOP 40699
[session2] psql
(40866)=# select * FROM
pg_get_remote_backend_memory_contexts('40699', false); -- waiting$ kill -s SIGSTOP 40866
$ kill -s SIGCONT 40699
[session3] psql
(47656) $ select pg_terminate_backend(40866);$ kill -s SIGCONT 40866 -- session2 terminated
[session3] (47656)=# select * FROM
pg_get_remote_backend_memory_contexts('47656', false); -- no responseIt seems the reason is memCtxState->in_use is now and
memCtxState->proc_id is 40699.
We can continue to use pg_get_remote_backend_memory_contexts() after
specifying 40699, but it'd be hard to understand for users.Thanks for testing and reporting. While I am not able to reproduce this
problem,
I think this may be happening because the requesting backend/caller is
terminated
before it gets a chance to mark memCtxState->in_use as false.
In this case memCtxState->in_use should be marked as
'false' possibly during the processing of ProcDiePending in
ProcessInterrupts().
This approach facilitates on-demand publication of memory statistics
for a specific backend, rather than collecting them at regular
intervals.
Since past memory context statistics may no longer be relevant,
there is little value in retaining historical data. Any collected
statistics
can be discarded once read by the client backend.A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics. This number was chosen arbitrarily,
as it covers all parent contexts at level 1 (i.e., direct children of
the top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.The statistics are reported in a breadth-first search order of the
memory context tree,
with parent contexts reported before their children. This provides a
cumulative summary
before diving into the details of each child context's consumption.The rationale behind the shared memory chunk is to ensure that the
majority of contexts which are the direct children of
TopMemoryContext,
fit into memory
This allows a client to request a summary of memory statistics,
which can be served from memory without the overhead of file access,
unless necessary.A publishing backend signals waiting client backends using a condition
variable when it has finished writing its statistics to memory.
The client backend checks whether the statistics belong to the
requested backend.
If not, it continues waiting on the condition variable, timing out
after 2 minutes.
This timeout is an arbitrary choice, and further work is required to
determine
a more practical value.All backends use the same memory space to publish their statistics.
Before publishing, a backend checks whether the previous statistics
have been
successfully read by a client using a shared flag, "in_use."
This flag is set by the publishing backend and cleared by the client
backend once the data is read. If a backend cannot publish due to
shared
memory being occupied, it exits the interrupt processing code,
and the client backend times out with a warning.Please find below an example query to fetch memory contexts from the
backend
with id '106114'. Second argument -'get_summary' is 'false',
indicating a request for statistics of all the contexts.postgres=#
select * FROM pg_get_remote_backend_memory_contexts('116292', false)
LIMIT 2;
-[ RECORD 1 ]-+----------------------
name | TopMemoryContext
ident |
type | AllocSet
path | {0}
total_bytes | 97696
total_nblocks | 5
free_bytes | 15376
free_chunks | 11
used_bytes | 82320
pid | 116292
-[ RECORD 2 ]-+----------------------
name | RowDescriptionContext
ident |
type | AllocSet
path | {0,1}
total_bytes | 8192
total_nblocks | 1
free_bytes | 6912
free_chunks | 0
used_bytes | 1280
pid | 11629232d3ed8165f821f introduced 1-based path to pg_backend_memory_contexts,
but pg_get_remote_backend_memory_contexts() seems to have 0-base path.Right. I will change it to match this commit.
pg_backend_memory_contexts has "level" column, but
pg_get_remote_backend_memory_contexts doesn't.Are there any reasons for these?
No particular reason, I can add this column as well.
Thank you,
Rahila Syed
On 2024-10-24 14:59, Rahila Syed wrote:
Hi Torikoshia,
Thank you for reviewing the patch!
On Wed, Oct 23, 2024 at 9:28 AM torikoshia
<torikoshia@oss.nttdata.com> wrote:On 2024-10-22 03:24, Rahila Syed wrote:
Hi,
PostgreSQL provides following capabilities for reporting memory
contexts statistics.
1. pg_get_backend_memory_contexts(); [1]
2. pg_log_backend_memory_contexts(pid); [2][1] provides a view of memory context statistics for a local
backend,
while [2] prints the memory context statistics of any backend or
auxiliary
process to the PostgreSQL logs. Although [1] offers detailed
statistics,
it is limited to the local backend, restricting its use toPostgreSQL
client backends only.
On the other hand, [2] provides the statistics for all backendsbut
logs them in a file,
which may not be convenient for quick access.I propose enhancing memory context statistics reporting by
combining
these
capabilities and offering a view of memory statistics for all
PostgreSQL backends
and auxiliary processes.Thanks for working on this!
I originally tried to develop something like your proposal in [2],
but
there were some difficulties and settled down to implement
pg_log_backend_memory_contexts().Yes. I am revisiting this problem :)
Attached is a patch that implements this functionality. It
introduces
a SQL function
that takes the PID of a backend as an argument, returning a set of
records,
each containing statistics for a single memory context. Theunderlying
C function
sends a signal to the backend and waits for it to publish itsmemory
context statistics
before returning them to the user. The publishing backend copies
these statistics
during the next CHECK_FOR_INTERRUPTS call.I remember waiting for dumping memory contexts stats could cause
trouble
considering some erroneous cases.For example, just after the target process finished dumping stats,
pg_get_remote_backend_memory_contexts() caller is terminated before
reading the stats, calling pg_get_remote_backend_memory_contexts()
has
no response any more:[session1]$ psql
(40699)=#$ kill -s SIGSTOP 40699
[session2] psql
(40866)=# select * FROM
pg_get_remote_backend_memory_contexts('40699', false); -- waiting$ kill -s SIGSTOP 40866
$ kill -s SIGCONT 40699
[session3] psql
(47656) $ select pg_terminate_backend(40866);$ kill -s SIGCONT 40866 -- session2 terminated
[session3] (47656)=# select * FROM
pg_get_remote_backend_memory_contexts('47656', false); -- no
responseIt seems the reason is memCtxState->in_use is now and
memCtxState->proc_id is 40699.
We can continue to use pg_get_remote_backend_memory_contexts() afterspecifying 40699, but it'd be hard to understand for users.
Thanks for testing and reporting. While I am not able to reproduce
this problem,
I think this may be happening because the requesting backend/caller is
terminated
before it gets a chance to mark memCtxState->in_use as false.
Yeah, when I attached a debugger to 47656 when it was waiting on
pg_get_remote_backend_memory_contexts('47656', false),
memCtxState->in_use was true as you suspected:
(lldb) p memCtxState->in_use
(bool) $1 = true
(lldb) p memCtxState->proc_id
(int) $2 = 40699
(lldb) p pid
(int) $3 = 47656
In this case memCtxState->in_use should be marked as
'false' possibly during the processing of ProcDiePending in
ProcessInterrupts().This approach facilitates on-demand publication of memory
statistics
for a specific backend, rather than collecting them at regular
intervals.
Since past memory context statistics may no longer be relevant,
there is little value in retaining historical data. Any collected
statistics
can be discarded once read by the client backend.A fixed-size shared memory block, currently accommodating 30
records,
is used to store the statistics. This number was chosen
arbitrarily,
as it covers all parent contexts at level 1 (i.e., direct
children of
the top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. Theclient
backend
first reads from the shared memory, and if necessary, retrievesthe
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.The statistics are reported in a breadth-first search order of the
memory context tree,
with parent contexts reported before their children. Thisprovides a
cumulative summary
before diving into the details of each child context'sconsumption.
The rationale behind the shared memory chunk is to ensure that the
majority of contexts which are the direct children of
TopMemoryContext,
fit into memory
This allows a client to request a summary of memory statistics,
which can be served from memory without the overhead of fileaccess,
unless necessary.
A publishing backend signals waiting client backends using a
condition
variable when it has finished writing its statistics to memory.
The client backend checks whether the statistics belong to the
requested backend.
If not, it continues waiting on the condition variable, timing out
after 2 minutes.
This timeout is an arbitrary choice, and further work is requiredto
determine
a more practical value.All backends use the same memory space to publish their
statistics.
Before publishing, a backend checks whether the previous
statistics
have been
successfully read by a client using a shared flag, "in_use."
This flag is set by the publishing backend and cleared by theclient
backend once the data is read. If a backend cannot publish due to
shared
memory being occupied, it exits the interrupt processing code,
and the client backend times out with a warning.Please find below an example query to fetch memory contexts from
the
backend
with id '106114'. Second argument -'get_summary' is 'false',
indicating a request for statistics of all the contexts.postgres=#
select * FROM pg_get_remote_backend_memory_contexts('116292',false)
LIMIT 2;
-[ RECORD 1 ]-+----------------------
name | TopMemoryContext
ident |
type | AllocSet
path | {0}
total_bytes | 97696
total_nblocks | 5
free_bytes | 15376
free_chunks | 11
used_bytes | 82320
pid | 116292
-[ RECORD 2 ]-+----------------------
name | RowDescriptionContext
ident |
type | AllocSet
path | {0,1}
total_bytes | 8192
total_nblocks | 1
free_bytes | 6912
free_chunks | 0
used_bytes | 1280
pid | 11629232d3ed8165f821f introduced 1-based path to
pg_backend_memory_contexts,
but pg_get_remote_backend_memory_contexts() seems to have 0-base
path.Right. I will change it to match this commit.
pg_backend_memory_contexts has "level" column, but
pg_get_remote_backend_memory_contexts doesn't.Are there any reasons for these?
No particular reason, I can add this column as well.
Thank you,
Rahila Syed
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Hi Rahila,
I’ve spent some time reviewing the patch, and the review is still
ongoing. Here are the comments I’ve found so far.
1.
The tests are currently missing. Could you please add them?
2.
I have some concerns regarding the function name
‘pg_get_remote_backend_memory_contexts’. Specifically, the term
‘remote’ doesn’t seem appropriate to me. The function retrieves data
from other processes running on the same machine, which might give the
impression that it deals with processes on different machines. This
could be misleading or unclear in this context. The argument ‘pid’
already indicates that it can get data from different processes.
Additionally, the term ‘backend’ also seems inappropriate since we are
obtaining data from processes that are different from backend
processes.
3.
+ Datum values[10];
+ bool nulls[10];
Please consider #defining the column count, or you could reuse the
existing one ‘PG_GET_BACKEND_MEMORY_CONTEXTS_COLS’.
4.
if (context_id <= 28)
if (context_id == 29)
if (context_id < 29)
#define these
5.
for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
{
MemoryContextId *cur_entry;cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
if (!found)
{
elog(LOG, "hash table corrupted, can't construct path value");
break;
}
path = lcons_int(cur_entry->context_id, path);
}
Similar code already exists in PutMemoryContextsStatsTupleStore().
Could you create a separate function to handle this?
6.
/*
* Shared memory is full, release lock and write to file from next
* iteration
*/
context_id++;
if (context_id == 29)
{
What if there are exactly 29 entries in the memory context? In that
case, creating the file would be unnecessary.
Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft
Show quoted text
On Wed, Oct 23, 2024 at 10:20 AM Rahila Syed <rahilasyed90@gmail.com> wrote:
Hi Michael,
Thank you for the review.
On Tue, Oct 22, 2024 at 12:18 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Oct 21, 2024 at 11:54:21PM +0530, Rahila Syed wrote:
On the other hand, [2] provides the statistics for all backends but logs
them in a file, which may not be convenient for quick access.To be precise, pg_log_backend_memory_contexts() pushes the memory
context stats to LOG_SERVER_ONLY or stderr, hence this is appended to
the server logs.A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics. This number was chosen arbitrarily,
as it covers all parent contexts at level 1 (i.e., direct children of the
top memory context)
based on my tests.
Further experiments are needed to determine the optimal number
for summarizing memory statistics.+ * Statistics are shared via fixed shared memory which + * can hold statistics for 29 contexts. The rest of the [...] + MemoryContextInfo memctx_infos[30]; [...] + memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo)); [...] + size = add_size(size, mul_size(30, sizeof(MemoryContextInfo))); [...] + memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo)); [...] + memset(&memCtxState->memctx_infos, 0, 30 * sizeof(MemoryContextInfo));This number is tied to MemoryContextState added by the patch. Sounds
like this would be better as a constant properly defined rather than
hardcoded in all these places. This would make the upper-bound more
easily switchable in the patch.Makes sense. Fixed in the attached patch.
+ Datum path[128]; + char type[128]; [...] + char name[1024]; + char ident[1024]; + char type[128]; + Datum path[128];Again, constants. Why these values? You may want to use more
#defines here.I added the #defines for these in the attached patch.
Size of the path array should match the number of levels in the memory
context tree and type is a constant string.For the name and ident, I have used the existing #define
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE as the size limit.Any additional statistics that exceed the shared memory capacity
are written to a file per backend in the PG_TEMP_FILES_DIR. The client
backend
first reads from the shared memory, and if necessary, retrieves the
remaining data from the file,
combining everything into a unified view. The files are cleaned up
automatically
if a backend crashes or during server restarts.Is the addition of the file to write any remaining stats really that
useful? This comes with a heavy cost in the patch with the "in_use"
flag, the various tweaks around the LWLock release/acquire protecting
the shmem area and the extra cleanup steps required after even a clean
restart. That's a lot of facility for this kind of information.The rationale behind using the file is to cater to the unbounded
number of memory contexts.
The "in_use" flag is used to govern the access to shared memory
as I am reserving enough memory for only one backend.
It ensures that another backend does not overwrite the statistics
in the shared memory, before it is read by a client backend.Another thing that may be worth considering is to put this information
in a DSM per the variable-size nature of the information, perhaps cap
it to a max to make the memory footprint cheaper, and avoid all
on-disk footprint because we don't need it to begin with as this is
information that makes sense only while the server is running.Thank you for the suggestion. I will look into using DSMs especially
if there is a way to limit the statistics dump, while still providing a user
with enough information to debug memory consumption.In this draft, I preferred using a file over DSMs, as a file can provide
ample space for dumping a large number of memory context statistics
without the risk of DSM creation failure due to insufficient memory.Also, why the single-backend limitation?
To reduce the memory footprint, the shared memory is
created for only one backend.
Each backend has to wait for previous operation
to finish before it can write.I think a good use case for this would be a background process
periodically running the monitoring function on each of the
backends sequentially to fetch the statistics.
This way there will be little contention for shared memory.In case a shared memory is not available, a backend immediately
returns from the interrupt handler without blocking its normal
operations.One could imagine a shared
memory area indexed similarly to pgproc entries, that includes
auxiliary processes as much as backends, so as it can be possible to
get more memory footprints through SQL for more than one single
process at one moment in time. If each backend has its own area of
shmem to deal with, they could use a shared LWLock on the shmem area
with an extra spinlock while the context data is dumped into memory as
the copy is short-lived. Each one of them could save the information
in a DSM created only when a dump of the shmem is requested for a
given PID, for example.I agree that such an infrastructure would be useful for fetching memory
statistics concurrently without significant synchronization overhead.
However, a drawback of this approach is reserving shared
memory slots up to MAX_BACKENDS without utilizing them
when no concurrent monitoring is happening.
As you mentioned, creating a DSM on the fly when a dump
request is received could help avoid over-allocating shared memory.
I will look into this suggestionThank you for your feedback!
Rahila Syed
On 2024-Oct-21, Rahila Syed wrote:
I propose enhancing memory context statistics reporting by combining
these capabilities and offering a view of memory statistics for all
PostgreSQL backends and auxiliary processes.
Sounds good.
A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics.
Hmm, would it make sene to use dynamic shared memory for this? The
publishing backend could dsm_create one DSM chunk of the exact size that
it needs, pass the dsm_handle to the consumer, and then have it be
destroy once it's been read. That way you don't have to define an
arbitrary limit of any size. (Maybe you could keep a limit to how much
is published in shared memory and spill the rest to disk, but I think
such a limit should be very high[1]This is very arbitrary of course, but 1 MB gives enough room for some 7000 contexts, which should cover normal cases., so that it's unlikely to take
effect in normal cases.)
[1]: This is very arbitrary of course, but 1 MB gives enough room for some 7000 contexts, which should cover normal cases.
some 7000 contexts, which should cover normal cases.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)
Hi,
On 2024-10-26 16:14:25 +0200, Alvaro Herrera wrote:
A fixed-size shared memory block, currently accommodating 30 records,
is used to store the statistics.Hmm, would it make sene to use dynamic shared memory for this?
+1
The publishing backend could dsm_create one DSM chunk of the exact size that
it needs, pass the dsm_handle to the consumer, and then have it be destroy
once it's been read.
I'd probably just make it a dshash table or such, keyed by the pid, pointing
to a dsa allocation with the stats.
That way you don't have to define an arbitrary limit
of any size. (Maybe you could keep a limit to how much is published in
shared memory and spill the rest to disk, but I think such a limit should be
very high[1], so that it's unlikely to take effect in normal cases.)[1] This is very arbitrary of course, but 1 MB gives enough room for
some 7000 contexts, which should cover normal cases.
Agreed. I can see a point in a limit for extreme cases, but spilling to disk
doesn't seem particularly useful.
Greetings,
Andres Freund
Hi,
Thank you for the review.
Hmm, would it make sene to use dynamic shared memory for this? The
publishing backend could dsm_create one DSM chunk of the exact size that
it needs, pass the dsm_handle to the consumer, and then have it be
destroy once it's been read. That way you don't have to define an
arbitrary limit of any size. (Maybe you could keep a limit to how much
is published in shared memory and spill the rest to disk, but I think
such a limit should be very high[1], so that it's unlikely to take
effect in normal cases.)
[1] This is very arbitrary of course, but 1 MB gives enough room for
some 7000 contexts, which should cover normal cases.
I used one DSA area per process to share statistics. Currently,
the size limit for each DSA is 16 MB, which can accommodate
approximately 6,700 MemoryContextInfo structs. Any additional
statistics will spill over to a file. I opted for DSAs over DSMs to
enable memory reuse by freeing segments for subsequent
statistics copies of the same backend, without needing to
recreate DSMs for each request.
The dsa_handle for each process is stored in an array,
indexed by the procNumber, within the shared memory.
The maximum size of this array is defined as the sum of
MaxBackends and the number of auxiliary processes.
As requested earlier, I have renamed the function to
pg_get_process_memory_contexts(pid, get_summary).
Suggestions for a better name are welcome.
When the get_summary argument is set to true, the function provides
statistics for memory contexts up to level 2—that is, the
top memory context and all its children.
Please find attached a rebased patch that includes these changes.
I will work on adding a test for the function and some code refactoring
suggestions.
Thank you,
Rahila Syed
Attachments:
v2-0001-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v2-0001-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From da1e991ac34ad026eebf4d6feeddcadf7f0348d9 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, it creates
a file and copies remaining stats to that file.
Once its done, it signals the
client backend using a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user.
The client backend reads the remaining
statistics from the file if it exists. The client backend
is reponsible for deleting the file when it finishes
the reading or the file will get deleted during restarts.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 310 ++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 427 +++++++++++++++++-
src/include/access/session.h | 1 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 64 +++
18 files changed, 836 insertions(+), 12 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..9caf8fa018 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 48350bec52..b3e6c2b5f0 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..8816ef6903 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index aac0b96bbc..97368f6b6a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3500,6 +3500,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..7a27b5f680 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..7ab435f70f 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+#include "common/file_utils.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
+dsa_area *memstats_area = NULL; /* The session-scoped DSA area for memory
+ * stats, (created in this session) */
/*
* int_list_to_array
@@ -305,3 +303,293 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_remote_backend_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a shared memory space. The statistics that do not fit in shared
+ * memory area are copied to a file by the backend.
+ *
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory and if needed to a temp file.
+ * Once condition variable comes out of sleep check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContextParams *mem_stat = NULL;
+ char tmpfilename[MAXPGPATH];
+ FILE *fp = NULL;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /* Only request the statistics that fit in memory, if get_summary is true. */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. The statistics exceeding 16MB
+ * are written to a file
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ memstats_area = area;
+ }
+ /* Querying stats from a new client backend */
+ else if (memstats_area == NULL)
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ }
+ else
+ {
+ area = memstats_area;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, 120000,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(WARNING,
+ (errmsg("Wait for %d process to publish stats timed out, try again", pid)));
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ return (Datum) 0;
+ }
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* DSA free allocation for this client */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ /* dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer); */
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ /* No more stats to read return */
+ if (memCtxState[procNumber].total_stats == i)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ return (Datum) 0;
+ }
+ /* Compute name for temp mem stat file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ memCtxState[procNumber].proc_id);
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+
+ /* Open file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_R);
+ if (!fp)
+ {
+ ereport(WARNING,
+ (errcode_for_file_access(),
+ errmsg("could not read from the file")));
+ return (Datum) 0;
+ }
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+ while (!feof(fp))
+ {
+ int path_length;
+ ArrayType *path_array;
+ Datum values[10];
+ bool nulls[10];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ /* Read stats from file */
+ if (fread(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ if (ferror(fp))
+ {
+ elog(WARNING, "File read error");
+ break;
+ }
+ /* EOF reached */
+ break;
+ }
+ path_length = mem_stat->path_length;
+ if (strlen(mem_stat->name) != 0)
+ values[0] = CStringGetTextDatum(mem_stat->name);
+ else
+ nulls[0] = true;
+
+ if (strlen(mem_stat->ident) != 0)
+ values[1] = CStringGetTextDatum(mem_stat->ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(mem_stat->type);
+
+ path_array = construct_array_builtin(mem_stat->path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(mem_stat->totalspace);
+ values[5] = Int64GetDatum(mem_stat->nblocks);
+ values[6] = Int64GetDatum(mem_stat->freespace);
+ values[7] = Int64GetDatum(mem_stat->freechunks);
+ values[8] = Int64GetDatum(mem_stat->totalspace - mem_stat->freespace);
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ values[9] = Int32GetDatum(memCtxState->proc_id);
+ LWLockRelease(&memCtxState->lw_lock);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ pfree(mem_stat);
+ FreeFile(fp);
+ /* Delete the temp file that stores memory stats */
+ unlink(tmpfilename);
+
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index bde54326c6..2505d6b992 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,20 +19,28 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
+#include "common/file_utils.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
static Size BogusGetChunkSpace(void *pointer);
+static int PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident);
/*****************************************************************************
* GLOBAL MEMORY *
@@ -166,6 +174,7 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident);
/*
* You should not do memory allocations within a critical section, because
@@ -1276,6 +1285,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1337,407 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via fixed shared memory which
+ * can hold statistics for 29 contexts. The rest of the
+ * statistics are stored in a file. This file is created
+ * in PG_TEMP_FILES_DIR and deleted by the client after
+ * reading the stats.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+ FILE *fp = NULL;
+ char tmpfilename[MAXPGPATH];
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_remote_backend_memory_contextis view, similar to its local
+ * backend couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * As in OpenTemporaryFileInTablespace, try to make the temp-file
+ * directory, ignoring errors.
+ */
+ (void) MakePGDirectory(PG_TEMP_FILES_DIR);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ stats_count = stats_count + 1;
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+
+ if (!get_summary)
+ continue;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ if (list_length(path) == 3)
+ {
+ stats_count = stats_count - 1;
+ break;
+ }
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics for all
+ * the memory contexts.
+ */
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 1))
+ {
+ /* Copy statistics to DSM memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL));
+ }
+ else
+ {
+ if (PublishMemoryContextToFile(cur, fp, path, (cur->ident != NULL ? clipped_ident : NULL)) == -1)
+ break;
+ }
+ /* Display information upto level 2 for summary */
+ if (get_summary && list_length(path) == 3)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ break;
+ }
+
+ /*
+ * DSA max limit is reached, release lock and write to file from next
+ * iteration if there are more statistics to report.
+ */
+ context_id++;
+ if (context_id == (num_stats - 1) && context_id < stats_count)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ /* Construct name for temp file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ MyProcPid);
+ /* Open file to copy rest of the stats in the file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_A);
+
+ if (fp == NULL)
+ break;
+ }
+ }
+ if (context_id < (num_stats - 1) && !get_summary)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ }
+
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].total_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+
+ /* Release file */
+ if (fp && FreeFile(fp))
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not free file \"%s\": %m", tmpfilename)));
+ }
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+}
+
+static int
+PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ MemoryContextParams *mem_stat;
+ char *type;
+
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+
+ /*
+ * Assuming the context name will not exceed context identifier display
+ * size XXX Reduce the limit for name length to correctly reflect
+ * practical examples XXX Add handling similar to clipped_ident of name
+ * exceeds the size limit
+ */
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(mem_stat->name, context->name, strlen(context->name));
+ }
+ else
+ mem_stat->name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(mem_stat->name, clipped_ident, strlen(clipped_ident));
+ mem_stat->ident[0] = '\0';
+ }
+ else
+ strncpy(mem_stat->ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ mem_stat->ident[0] = '\0';
+
+ mem_stat->path_length = list_length(path);
+ foreach_int(i, path)
+ mem_stat->path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context itself */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ }
+ mem_stat->totalspace = stat.totalspace;
+ mem_stat->nblocks = stat.nblocks;
+ mem_stat->freespace = stat.freespace;
+ mem_stat->freechunks = stat.freechunks;
+
+ if (!fp)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not create file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ if (fwrite(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not write to file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ pfree(mem_stat);
+
+ return 0;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/access/session.h b/src/include/access/session.h
index ce48449a87..f1b08555fa 100644
--- a/src/include/access/session.h
+++ b/src/include/access/session.h
@@ -31,6 +31,7 @@ typedef struct Session
struct SharedRecordTypmodRegistry *shared_typmod_registry;
dshash_table *shared_record_table;
dshash_table *shared_typmod_table;
+
} Session;
extern void InitializeSession(void);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..b205c54710 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8436,6 +8436,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int4,int4,int4,int4,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 8ca98f65b2..0835d8d552 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 3590c8bad9..196da8623f 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,62 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryContextState;
+
+/* Backend local struct used to write statistics to a file */
+typedef struct MemoryContextParams
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char type[MAX_TYPE_STRING_LENGTH];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextParams;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
@@ -205,5 +268,6 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
}
return true;
}
+#define MAX_NUM_MEM_STATS DSA_MAX_SEGMENT_SIZE / sizeof(MemoryContextInfo)
#endif /* MEMUTILS_H */
--
2.34.1
On Wed, Nov 13, 2024 at 01:00:52PM +0530, Rahila Syed wrote:
I used one DSA area per process to share statistics. Currently,
the size limit for each DSA is 16 MB, which can accommodate
approximately 6,700 MemoryContextInfo structs. Any additional
statistics will spill over to a file. I opted for DSAs over DSMs to
enable memory reuse by freeing segments for subsequent
statistics copies of the same backend, without needing to
recreate DSMs for each request.
Already mentioned previously at [1]/messages/by-id/ZxdKx0DywUTAvkEF@paquier.xyz -- Michael and echoing with some surrounding
arguments, but I'd suggest to keep it simple and just remove entirely
the part of the patch where the stats information gets spilled into
disk. With more than 6000-ish context information available with a
hard limit in place, there should be plenty enough to know what's
going on anyway.
[1]: /messages/by-id/ZxdKx0DywUTAvkEF@paquier.xyz -- Michael
--
Michael
On 2024-Nov-14, Michael Paquier wrote:
Already mentioned previously at [1] and echoing with some surrounding
arguments, but I'd suggest to keep it simple and just remove entirely
the part of the patch where the stats information gets spilled into
disk. With more than 6000-ish context information available with a
hard limit in place, there should be plenty enough to know what's
going on anyway.
Functionally-wise I don't necessarily agree with _removing_ the spill
code, considering that production systems with thousands of tables would
easily reach that number of contexts (each index gets its own index info
context, each regexp gets its own memcxt); and I don't think silently
omitting a fraction of people's memory situation (or erroring out if the
case is hit) is going to make us any friends.
That said, it worries me that we choose a shared memory size so large
that it becomes impractical to hit the spill-to-disk code in regression
testing. Maybe we can choose a much smaller limit size when
USE_ASSERT_CHECKING is enabled, and use a test that hits that number?
That way, we know the code is being hit and tested, without imposing a
huge memory consumption on test machines.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)
Hi,
On Thu, Nov 14, 2024 at 5:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:
On 2024-Nov-14, Michael Paquier wrote:
Already mentioned previously at [1] and echoing with some surrounding
arguments, but I'd suggest to keep it simple and just remove entirely
the part of the patch where the stats information gets spilled into
disk. With more than 6000-ish context information available with a
hard limit in place, there should be plenty enough to know what's
going on anyway.Functionally-wise I don't necessarily agree with _removing_ the spill
code, considering that production systems with thousands of tables would
easily reach that number of contexts (each index gets its own index info
context, each regexp gets its own memcxt); and I don't think silently
omitting a fraction of people's memory situation (or erroring out if the
case is hit) is going to make us any friends.
While I agree that removing the spill-to-file logic will simplify the code,
I also understand the rationale for retaining it to ensure completeness.
To achieve both completeness and avoid writing to a file, I can consider
displaying the numbers for the remaining contexts as a cumulative total
at the end of the output.
Something like follows:
```
postgres=# select * from pg_get_process_memory_contexts('237244', false);
name | ident
| type | path | total_bytes | tot
al_nblocks | free_bytes | free_chunks | used_bytes | pid
---------------------------------------+------------------------------------------------+----------+--------------+-------------+----
-----------+------------+-------------+------------+--------
TopMemoryContext |
| AllocSet | {0} | 97696 |
5 | 14288 | 11 | 83408 | 237244
search_path processing cache |
| AllocSet | {0,1} | 8192 |
1 | 5328 | 7 | 2864 | 237244
*Remaining contexts total: 23456 bytes (total_bytes) , 12345(used_bytes),
11,111(free_bytes)*
```
That said, it worries me that we choose a shared memory size so large
that it becomes impractical to hit the spill-to-disk code in regression
testing. Maybe we can choose a much smaller limit size when
USE_ASSERT_CHECKING is enabled, and use a test that hits that number?
That way, we know the code is being hit and tested, without imposing a
huge memory consumption on test machines.
Makes sense. I will look into writing such a test, if we finalize the
approach
of spill-to-file.
Please find attached a rebased and updated patch with a basic test
and some fixes. Kindly let me know your thoughts.
Thank you,
Rahila Syed
Attachments:
v3-0001-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v3-0001-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From 51c5467bab94c3f2d49e9eccbab86b3f7da74bb2 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, it creates
a file and copies remaining stats to that file.
Once its done, it signals the
client backend using a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user.
The client backend reads the remaining
statistics from the file if it exists. The client backend
is reponsible for deleting the file when it finishes
the reading or the file will get deleted during restarts.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 318 ++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 428 +++++++++++++++++-
src/include/access/session.h | 1 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 65 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
20 files changed, 870 insertions(+), 12 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..9caf8fa018 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 48350bec52..b3e6c2b5f0 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..8816ef6903 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 184b830168..4436b885d9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3500,6 +3500,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..7a27b5f680 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..6176f18be3 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+#include "common/file_utils.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
+dsa_area *memstats_area = NULL; /* The session-scoped DSA area for memory
+ * stats, (created in this session) */
/*
* int_list_to_array
@@ -305,3 +303,301 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_remote_backend_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a shared memory space. The statistics that do not fit in shared
+ * memory area are copied to a file by the backend.
+ *
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory and if needed to a temp file.
+ * Once condition variable comes out of sleep check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContextParams *mem_stat = NULL;
+ char tmpfilename[MAXPGPATH];
+ FILE *fp = NULL;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ return (Datum) 0;
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_get_backend_memory_contexts instead")));
+ PG_RETURN_NULL();
+ }
+ /* Return statistics of top level 1 and 2 contexts, if get_summary is true. */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. The statistics exceeding 16MB
+ * are written to a file
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ memstats_area = area;
+ }
+ /* Querying stats from a new client backend */
+ else if (memstats_area == NULL)
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ memstats_area = area;
+ }
+ else
+ {
+ area = memstats_area;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ return (Datum) 0;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, 120000,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(WARNING,
+ (errmsg("Wait for %d process to publish stats timed out, try again", pid)));
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ return (Datum) 0;
+ }
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* DSA free allocation for this client */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ /* No more stats to read return */
+ if (memCtxState[procNumber].total_stats == i)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ return (Datum) 0;
+ }
+ /* Compute name for temp mem stat file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ memCtxState[procNumber].proc_id);
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+
+ /* Open file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_R);
+ if (!fp)
+ {
+ ereport(WARNING,
+ (errcode_for_file_access(),
+ errmsg("could not read from the file")));
+ return (Datum) 0;
+ }
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+ while (!feof(fp))
+ {
+ int path_length;
+ ArrayType *path_array;
+ Datum values[10];
+ bool nulls[10];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ /* Read stats from file */
+ if (fread(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ if (ferror(fp))
+ {
+ elog(WARNING, "File read error");
+ break;
+ }
+ /* EOF reached */
+ break;
+ }
+ path_length = mem_stat->path_length;
+ if (strlen(mem_stat->name) != 0)
+ values[0] = CStringGetTextDatum(mem_stat->name);
+ else
+ nulls[0] = true;
+
+ if (strlen(mem_stat->ident) != 0)
+ values[1] = CStringGetTextDatum(mem_stat->ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(mem_stat->type);
+
+ path_array = construct_array_builtin(mem_stat->path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(mem_stat->totalspace);
+ values[5] = Int64GetDatum(mem_stat->nblocks);
+ values[6] = Int64GetDatum(mem_stat->freespace);
+ values[7] = Int64GetDatum(mem_stat->freechunks);
+ values[8] = Int64GetDatum(mem_stat->totalspace - mem_stat->freespace);
+ LWLockAcquire(&memCtxState->lw_lock, LW_EXCLUSIVE);
+ values[9] = Int32GetDatum(memCtxState->proc_id);
+ LWLockRelease(&memCtxState->lw_lock);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ pfree(mem_stat);
+ FreeFile(fp);
+ /* Delete the temp file that stores memory stats */
+ unlink(tmpfilename);
+
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index bde54326c6..532e622f27 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,20 +19,28 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
+#include "common/file_utils.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
static Size BogusGetChunkSpace(void *pointer);
+static int PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident);
/*****************************************************************************
* GLOBAL MEMORY *
@@ -166,6 +174,7 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident);
/*
* You should not do memory allocations within a critical section, because
@@ -1276,6 +1285,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1337,408 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via fixed shared memory which
+ * can hold statistics for 29 contexts. The rest of the
+ * statistics are stored in a file. This file is created
+ * in PG_TEMP_FILES_DIR and deleted by the client after
+ * reading the stats.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+ FILE *fp = NULL;
+ char tmpfilename[MAXPGPATH];
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_remote_backend_memory_contextis view, similar to its local
+ * backend couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * As in OpenTemporaryFileInTablespace, try to make the temp-file
+ * directory, ignoring errors.
+ */
+ (void) MakePGDirectory(PG_TEMP_FILES_DIR);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ if (memstats_area == NULL)
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ stats_count = stats_count + 1;
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+
+ if (!get_summary)
+ continue;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ if (list_length(path) == 3)
+ {
+ stats_count = stats_count - 1;
+ break;
+ }
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics for all
+ * the memory contexts.
+ */
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 1))
+ {
+ /* Copy statistics to DSM memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL));
+ }
+ else
+ {
+ if (PublishMemoryContextToFile(cur, fp, path, (cur->ident != NULL ? clipped_ident : NULL)) == -1)
+ break;
+ }
+ /* Display information upto level 2 for summary */
+ if (get_summary && list_length(path) == 3)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ break;
+ }
+
+ /*
+ * DSA max limit is reached, release lock and write to file from next
+ * iteration if there are more statistics to report.
+ */
+ context_id++;
+ if (context_id == (num_stats - 1) && context_id < stats_count)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ /* Construct name for temp file */
+ snprintf(tmpfilename, MAXPGPATH, "%s/%s.memstats.%d",
+ PG_TEMP_FILES_DIR, PG_TEMP_FILE_PREFIX,
+ MyProcPid);
+ /* Open file to copy rest of the stats in the file */
+ fp = AllocateFile(tmpfilename, PG_BINARY_A);
+
+ if (fp == NULL)
+ break;
+ }
+ }
+ if (context_id < (num_stats - 1) && !get_summary)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ }
+
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].total_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+
+ /* Release file */
+ if (fp && FreeFile(fp))
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not free file \"%s\": %m", tmpfilename)));
+ }
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+}
+
+static int
+PublishMemoryContextToFile(MemoryContext context, FILE *fp, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ MemoryContextParams *mem_stat;
+ char *type;
+
+ mem_stat = palloc0(sizeof(MemoryContextParams));
+
+ /*
+ * Assuming the context name will not exceed context identifier display
+ * size XXX Reduce the limit for name length to correctly reflect
+ * practical examples XXX Add handling similar to clipped_ident of name
+ * exceeds the size limit
+ */
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(mem_stat->name, context->name, strlen(context->name));
+ }
+ else
+ mem_stat->name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(mem_stat->name, clipped_ident, strlen(clipped_ident));
+ mem_stat->ident[0] = '\0';
+ }
+ else
+ strncpy(mem_stat->ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ mem_stat->ident[0] = '\0';
+
+ mem_stat->path_length = list_length(path);
+ foreach_int(i, path)
+ mem_stat->path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context itself */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(mem_stat->type, type, strlen(type));
+ break;
+ }
+ mem_stat->totalspace = stat.totalspace;
+ mem_stat->nblocks = stat.nblocks;
+ mem_stat->freespace = stat.freespace;
+ mem_stat->freechunks = stat.freechunks;
+
+ if (!fp)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not create file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ if (fwrite(mem_stat, sizeof(MemoryContextParams), 1, fp) != 1)
+ {
+ ereport(LOG,
+ (errcode_for_file_access(),
+ errmsg("could not write to file")));
+ pfree(mem_stat);
+ return -1;
+ }
+ pfree(mem_stat);
+
+ return 0;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/access/session.h b/src/include/access/session.h
index ce48449a87..f1b08555fa 100644
--- a/src/include/access/session.h
+++ b/src/include/access/session.h
@@ -31,6 +31,7 @@ typedef struct Session
struct SharedRecordTypmodRegistry *shared_typmod_registry;
dshash_table *shared_record_table;
dshash_table *shared_typmod_table;
+
} Session;
extern void InitializeSession(void);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..b205c54710 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8436,6 +8436,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int4,int4,int4,int4,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 42a2b38cac..7184727cf1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 3590c8bad9..6fa6df30d0 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,62 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryContextState;
+
+/* Backend local struct used to write statistics to a file */
+typedef struct MemoryContextParams
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char type[MAX_TYPE_STRING_LENGTH];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextParams;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
@@ -205,5 +268,7 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
}
return true;
}
+#define MAX_NUM_MEM_STATS DSA_MAX_SEGMENT_SIZE / sizeof(MemoryContextInfo)
+extern dsa_area *memstats_area;
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..eecba122c3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -222,3 +222,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident, total_bytes >= free_bytes
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,,t)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..13a4a0bfe2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident, total_bytes >= free_bytes
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.34.1
Hi,
To achieve both completeness and avoid writing to a file, I can consider
displaying the numbers for the remaining contexts as a cumulative total
at the end of the output.Something like follows:
```
postgres=# select * from pg_get_process_memory_contexts('237244', false);
name | ident
| type | path | total_bytes | tot
al_nblocks | free_bytes | free_chunks | used_bytes | pid---------------------------------------+------------------------------------------------+----------+--------------+-------------+----
-----------+------------+-------------+------------+--------
TopMemoryContext |
| AllocSet | {0} | 97696 |
5 | 14288 | 11 | 83408 | 237244
search_path processing cache |
| AllocSet | {0,1} | 8192 |
1 | 5328 | 7 | 2864 | 237244*Remaining contexts total: 23456 bytes (total_bytes) ,
12345(used_bytes), 11,111(free_bytes)*
```
Please find attached an updated patch with this change. The file previously
used to
store spilled statistics has been removed. Instead, a cumulative total of
the
remaining/spilled context statistics is now stored in the DSM segment,
which is
displayed as follows.
postgres=# select * from pg_get_process_memory_contexts('352966', false);
* name * | ident | type | path | *total_bytes*
| *total_nblocks* | *free_bytes *| *free_chunks *| *used_bytes* | pi
d
------------------------------+-------+----------+--------+-------------+---------------+------------+-------------+------------+----
----
TopMemoryContext | | AllocSet | {0} | 97696 |
5 | 14288 | 11 | 83408 | 352
966
.
.
.
MdSmgr | | AllocSet | {0,18} | 8192 |
1 | 7424 | 0 | 768 | 352
966
* Remaining Totals* | | | | *1756016*
| *188 *| *658584 *| * 132* | * 1097432 *| 352
966
(7129 rows)
-----
I believe this serves as a good compromise between completeness
and avoiding the overhead of file handling. However, I am open to
reintroducing file handling if displaying the complete statistics of the
remaining contexts prove to be more important.
All the known bugs in the patch have been fixed.
In summary, one DSA per PostgreSQL process is used to share
the statistics of that process. A DSA is created by the first client
backend that requests memory context statistics, and it is pinned
for all future requests to that process.
A handle to this DSA is shared between the client and the publishing
process using fixed shared memory. The fixed shared memory consists
of an array of size MaxBackends + auxiliary processes, indexed
by procno. Each element in this array is less than 100 bytes in size.
A PostgreSQL process uses a condition variable to signal a waiting client
backend once it has finished publishing the statistics. If, for some
reason,
the signal is not sent, the waiting client backend will time out.
When statistics of a local backend is requested, this function returns the
following
WARNING and exits, since this can be handled by an existing function which
doesn't require a DSA.
WARNING: cannot return statistics for local backend
HINT: Use pg_get_backend_memory_contexts instead
Looking forward to your review.
Thank you,
Rahila Syed
Attachments:
v4-Function-to-report-memory-context-stats-of-a-process.patchapplication/octet-stream; name=v4-Function-to-report-memory-context-stats-of-a-process.patchDownload
From 9fd3e1017b21853a11542bc3e72bf68ea6ed9013 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 254 ++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 302 +++++++++++++++++-
src/include/access/session.h | 1 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 51 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
20 files changed, 666 insertions(+), 12 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..9caf8fa018 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 48350bec52..b3e6c2b5f0 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..8816ef6903 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 184b830168..4436b885d9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3500,6 +3500,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..7a27b5f680 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..da70b9ee4c 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,23 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -305,3 +300,240 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_remote_backend_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a dynamic shared memory space. The statistics for contexts that do not fit in
+ * shared memory area are stored as a cumulative total of those contexts,
+ * at the end in the dynamic shared memory.
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory.
+ * Once condition variable comes out of sleep, check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ return (Datum) 0;
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_get_backend_memory_contexts instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Return statistics of top level 1 and 2 contexts, if get_summary is
+ * true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ return (Datum) 0;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, 120000,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(WARNING,
+ (errmsg("Wait for %d process to publish stats timed out, try again", pid)));
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ return (Datum) 0;
+ }
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ /* DSA free allocation for this client */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index bde54326c6..7f1f159bda 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -166,6 +172,7 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident);
/*
* You should not do memory allocations within a critical section, because
@@ -1276,6 +1283,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1335,284 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which
+ * can hold statistics of approx 6700 contexts. Remaining
+ * contexts statistics is captured as a cumulative total.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_remote_backend_memory_contextis view, similar to its local
+ * backend couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ stats_count = stats_count + 1;
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+
+ if (!get_summary)
+ continue;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ if (list_length(path) == 3)
+ {
+ stats_count = stats_count - 1;
+ break;
+ }
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics for all
+ * the memory contexts.
+ */
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 2))
+ {
+ /* Copy statistics to DSM memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL));
+ }
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[num_stats - 1].totalspace += stat.totalspace;
+ meminfo[num_stats - 1].nblocks += stat.nblocks;
+ meminfo[num_stats - 1].freespace += stat.freespace;
+ meminfo[num_stats - 1].freechunks += stat.freechunks;
+ }
+ /* Display information upto level 2 for summary */
+ if (get_summary && list_length(path) == 3)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ break;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (num_stats - 2) && context_id < stats_count)
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (num_stats - 2) && !get_summary)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ memCtxState[idx].total_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/access/session.h b/src/include/access/session.h
index ce48449a87..f1b08555fa 100644
--- a/src/include/access/session.h
+++ b/src/include/access/session.h
@@ -31,6 +31,7 @@ typedef struct Session
struct SharedRecordTypmodRegistry *shared_typmod_registry;
dshash_table *shared_record_table;
dshash_table *shared_typmod_table;
+
} Session;
extern void InitializeSession(void);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..b205c54710 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8436,6 +8436,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int4,int4,int4,int4,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 42a2b38cac..7184727cf1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index bf93433b78..35152d28d9 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,49 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
@@ -318,5 +368,6 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+#define MAX_NUM_MEM_STATS DSA_MAX_SEGMENT_SIZE / sizeof(MemoryContextInfo)
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..eecba122c3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -222,3 +222,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident, total_bytes >= free_bytes
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,,t)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..13a4a0bfe2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident, total_bytes >= free_bytes
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.34.1
On Wed, Nov 20, 2024 at 2:39 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Hi,
To achieve both completeness and avoid writing to a file, I can consider
displaying the numbers for the remaining contexts as a cumulative total
at the end of the output.Something like follows:
```
postgres=# select * from pg_get_process_memory_contexts('237244', false);
name | ident | type | path | total_bytes | tot
al_nblocks | free_bytes | free_chunks | used_bytes | pid
---------------------------------------+------------------------------------------------+----------+--------------+-------------+----
-----------+------------+-------------+------------+--------
TopMemoryContext | | AllocSet | {0} | 97696 |
5 | 14288 | 11 | 83408 | 237244
search_path processing cache | | AllocSet | {0,1} | 8192 |
1 | 5328 | 7 | 2864 | 237244
Remaining contexts total: 23456 bytes (total_bytes) , 12345(used_bytes), 11,111(free_bytes)```
Please find attached an updated patch with this change. The file previously used to
store spilled statistics has been removed. Instead, a cumulative total of the
remaining/spilled context statistics is now stored in the DSM segment, which is
displayed as follows.postgres=# select * from pg_get_process_memory_contexts('352966', false);
name | ident | type | path | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes | pi
d
------------------------------+-------+----------+--------+-------------+---------------+------------+-------------+------------+----
----
TopMemoryContext | | AllocSet | {0} | 97696 | 5 | 14288 | 11 | 83408 | 352
966
.
.
.
MdSmgr | | AllocSet | {0,18} | 8192 | 1 | 7424 | 0 | 768 | 352
966
Remaining Totals | | | | 1756016 | 188 | 658584 | 132 | 1097432 | 352
966
(7129 rows)
-----I believe this serves as a good compromise between completeness
and avoiding the overhead of file handling. However, I am open to
reintroducing file handling if displaying the complete statistics of the
remaining contexts prove to be more important.All the known bugs in the patch have been fixed.
In summary, one DSA per PostgreSQL process is used to share
the statistics of that process. A DSA is created by the first client
backend that requests memory context statistics, and it is pinned
for all future requests to that process.
A handle to this DSA is shared between the client and the publishing
process using fixed shared memory. The fixed shared memory consists
of an array of size MaxBackends + auxiliary processes, indexed
by procno. Each element in this array is less than 100 bytes in size.A PostgreSQL process uses a condition variable to signal a waiting client
backend once it has finished publishing the statistics. If, for some reason,
the signal is not sent, the waiting client backend will time out.
How does the process know that the client backend has finished reading
stats and it can be refreshed? What happens, if the next request for
memory context stats comes before first requester has consumed the
statistics it requested?
Does the shared memory get deallocated when the backend which
allocated it exits?
When statistics of a local backend is requested, this function returns the following
WARNING and exits, since this can be handled by an existing function which
doesn't require a DSA.WARNING: cannot return statistics for local backend
HINT: Use pg_get_backend_memory_contexts instead
How about using pg_get_backend_memory_contexts() for both - local as
well as other backend? Let PID argument default to NULL which would
indicate local backend, otherwise some other backend?
--
Best Wishes,
Ashutosh Bapat
Hi,
How does the process know that the client backend has finished reading
stats and it can be refreshed? What happens, if the next request for
memory context stats comes before first requester has consumed the
statistics it requested?A process that's copying its statistics does not need to know that.
Whenever it receives a signal to copy statistics, it goes ahead and
copies the latest statistics to the DSA after acquiring an exclusive
lwlock.
A requestor takes a lock before it starts consuming the statistics.
If the next request comes while the first requestor is consuming the
statistics, the publishing process will wait on lwlock to be released
by the consuming process before it can write the statistics.
If the next request arrives before the first requester begins consuming
the statistics, the publishing process will acquire the lock and overwrite
the earlier statistics with the most recent ones.
As a result, both the first and second requesters will consume the
updated statistics.
Does the shared memory get deallocated when the backend which
allocated it exits?
Memory in the DSA is allocated by a postgres process and deallocated
by the client backend for each request. Both the publishing postgres process
and the client backend detach from the DSA at the end of each request.
However, the DSM segment(s) persist even after all the processes exit
and are only destroyed upon a server restart. Each DSA is associated
with the procNumber of a postgres process and
can be re-used by any future process with the same procNumber.
When statistics of a local backend is requested, this function returns
the following
WARNING and exits, since this can be handled by an existing function
which
doesn't require a DSA.
WARNING: cannot return statistics for local backend
HINT: Use pg_get_backend_memory_contexts insteadHow about using pg_get_backend_memory_contexts() for both - local as
well as other backend? Let PID argument default to NULL which would
indicate local backend, otherwise some other backend?I don't see much value in combining the two, specially since with
pg_get_process_memory_contexts() we can query both the postgres
backend and a background process, the name pg_get_backend_memory_context()
would be inaccurate and I am not sure whether a change to rename the
existing function would be welcome.
Please find an updated patch which fixes an issue seen in CI runs.
Thank you,
Rahila Syed
Attachments:
v5-Function-to-report-memory-context-stats-of-a-process.patchapplication/octet-stream; name=v5-Function-to-report-memory-context-stats-of-a-process.patchDownload
From 076d06506d77f20de6f8d185698da613d4789b65 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 254 ++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 306 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 50 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
19 files changed, 668 insertions(+), 12 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..9caf8fa018 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 48350bec52..b3e6c2b5f0 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..8816ef6903 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 184b830168..4436b885d9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3500,6 +3500,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..7a27b5f680 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..163de2d2d2 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,23 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -305,3 +300,240 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_remote_backend_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a dynamic shared memory space. The statistics for contexts that do not fit in
+ * shared memory area are stored as a cumulative total of those contexts,
+ * at the end in the dynamic shared memory.
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory.
+ * Once condition variable comes out of sleep, check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ return (Datum) 0;
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_get_backend_memory_contexts instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Return statistics of top level 1 and 2 contexts, if get_summary is
+ * true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ return (Datum) 0;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, 120000,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(WARNING,
+ (errmsg("Wait for %d process to publish stats timed out, try again", pid)));
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ return (Datum) 0;
+ }
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ /* DSA free allocation for this client */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index bde54326c6..5bdeacb74a 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -166,6 +172,7 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident);
/*
* You should not do memory allocations within a critical section, because
@@ -1276,6 +1283,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1335,288 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which
+ * can hold statistics of approx 6700 contexts. Remaining
+ * contexts statistics is captured as a cumulative total.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_remote_backend_memory_contextis view, similar to its local
+ * backend couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ stats_count = stats_count + 1;
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+
+ if (!get_summary)
+ continue;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ if (list_length(path) == 3)
+ {
+ stats_count = stats_count - 1;
+ break;
+ }
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto num_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = stats_count > num_stats ? num_stats : stats_count;
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 2))
+ {
+ /* Copy statistics to DSM memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL));
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[num_stats - 1].totalspace += stat.totalspace;
+ meminfo[num_stats - 1].nblocks += stat.nblocks;
+ meminfo[num_stats - 1].freespace += stat.freespace;
+ meminfo[num_stats - 1].freechunks += stat.freechunks;
+ }
+ /* Display information upto level 2 for summary */
+ if (get_summary && list_length(path) == 3)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ break;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (num_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (num_stats - 2) && !get_summary)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ memCtxState[idx].total_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..b205c54710 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8436,6 +8436,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int4,int4,int4,int4,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 42a2b38cac..7184727cf1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index bf93433b78..d1d3b9da93 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,49 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..eecba122c3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -222,3 +222,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident, total_bytes >= free_bytes
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,,t)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..13a4a0bfe2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident, total_bytes >= free_bytes
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.34.1
On Fri, Nov 22, 2024 at 6:33 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Hi,
How does the process know that the client backend has finished reading
stats and it can be refreshed? What happens, if the next request for
memory context stats comes before first requester has consumed the
statistics it requested?A process that's copying its statistics does not need to know that.
Whenever it receives a signal to copy statistics, it goes ahead and
copies the latest statistics to the DSA after acquiring an exclusive
lwlock.A requestor takes a lock before it starts consuming the statistics.
If the next request comes while the first requestor is consuming the
statistics, the publishing process will wait on lwlock to be released
by the consuming process before it can write the statistics.
If the next request arrives before the first requester begins consuming
the statistics, the publishing process will acquire the lock and overwrite
the earlier statistics with the most recent ones.
As a result, both the first and second requesters will consume the
updated statistics.
IIUC, the publisher and the consumer processes, both, use the same
LWLock. Publisher acquires an exclusive lock. Does consumer acquire
SHARED lock?
The publisher process might be in a transaction, processing a query or
doing something else. If it has to wait for an LWLock may affect its
performance. This will become even more visible if the client backend
is trying to diagnose a slow running query. Have we tried to measure
how long the publisher might have to wait for an LWLock while the
consumer is consuming statistics OR what is the impact of this wait?
When statistics of a local backend is requested, this function returns the following
WARNING and exits, since this can be handled by an existing function which
doesn't require a DSA.WARNING: cannot return statistics for local backend
HINT: Use pg_get_backend_memory_contexts insteadHow about using pg_get_backend_memory_contexts() for both - local as
well as other backend? Let PID argument default to NULL which would
indicate local backend, otherwise some other backend?I don't see much value in combining the two, specially since with
pg_get_process_memory_contexts() we can query both the postgres
backend and a background process, the name pg_get_backend_memory_context()
would be inaccurate and I am not sure whether a change to rename the
existing function would be welcome.
Having two separate functions for the same functionality isn't a
friendly user interface.
Playing a bit with pg_terminate_backend() which is another function
dealing with backends to understand a. what does it do to its own
backend and b. which processes are considered backends.
1. pg_terminate_backend() allows to terminate the backend from which
it is fired.
#select pid, application_name, backend_type, pg_terminate_backend(pid)
from pg_stat_activity;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
2. It considers autovacuum launcher and logical replication launcher
as postgres backends but not checkpointer, background writer and
walwriter.
#select pid, application_name, backend_type, pg_terminate_backend(pid)
from pg_stat_activity where pid <> pg_backend_pid();
WARNING: PID 644887 is not a PostgreSQL backend process
WARNING: PID 644888 is not a PostgreSQL backend process
WARNING: PID 644890 is not a PostgreSQL backend process
pid | application_name | backend_type | pg_terminate_backend
--------+------------------+------------------------------+----------------------
645636 | | autovacuum launcher | t
645677 | | logical replication launcher | t
644887 | | checkpointer | f
644888 | | background writer | f
644890 | | walwriter | f
(5 rows)
In that sense you are correct that pg_get_backend_memory_context()
should not provide context information of WAL writer process for
example. But pg_get_process_memory_contexts() would be expected to
provide its own memory context information instead of redirecting to
another function through a WARNING. It could do that redirection
itself. That will also prevent the functions' output format going out
of sync.
--
Best Wishes,
Ashutosh Bapat
Hi,
I took a quick look at the patch today. Overall, I think this would be
very useful, I've repeatedly needed to inspect why a backend uses so
much memory, and I ended up triggering MemoryContextStats() from gdb.
This would be more convenient / safer. So +1 to the patch intent.
A couple review comments:
1) I read through the thread, and in general I agree with the reasoning
for removing the file part - it seems perfectly fine to just dump as
much as we can fit into a buffer, and then summarize the rest. But do we
need to invent a "new" limit here? The other places logging memory
contexts do something like this:
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
Which means we only print the 100 memory contexts at the top, and that's
it. Wouldn't that give us a reasonable memory limit too?
2) I see the function got renamed to pg_get_process_memory_contexts(),
bu the comment still says pg_get_remote_backend_memory_contexts().
3) I don't see any SGML docs for this new function. I was a bit unsure
what the "summary" argument is meant to do. The comment does not explain
that either.
4) I wonder if the function needs to return PID. I mean, the caller
knows which PID it is for, so it seems rather unnecessary.
5) In the "summary" mode, it might be useful to include info about how
many child contexts were aggregated. It's useful to know whether there
was 1 child or 10000 children. In the regular (non-summary) mode it'd
always be "1", probably, but maybe it'd interact with the limit in (1).
Not sure.
6) I feel a bit uneasy about the whole locking / communication scheme.
In particular, I'm worried about lockups, deadlocks, etc. So I decided
to do a trivial stress-test - just run the new function through pgbench
with many clients.
The memstats.sql script does just this:
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
ORDER BY random() LIMIT 1)
, false);
where the inner query just picks a PID for some other backend, and asks
for memory context stats for that.
And just run it like this on a scale 1 pgbench database:
pgbench -n -f memstats.sql -c 10 test
And it gets stuck *immediately*. I've seen it to wait for other client
backends and auxiliary processes like autovacuum launcher.
This is absolutely idle system, there's no reason why a process would
not respond almost immediately. I wonder if e.g. autovacuum launcher may
not be handling these requests, or what if client backends can wait in a
cycle. IIRC condition variables are not covered by a deadlock detector,
so that would be an issue. But maybe I remember wrong?
7) I've also seen this error:
pgbench: error: client 6 script 0 aborted in command 0 query 0: \
ERROR: can't attach the same segment more than once
I haven't investigated it, but it seems like a problem handling errors,
where we fail to detach from a segment after a timeout. I may be wrong,
but it might be related to this:
I opted for DSAs over DSMs to enable memory reuse by freeing
segments for subsequent statistics copies of the same backend,
without needing to recreate DSMs for each request.
I feel like this might be a premature optimization - I don't have a
clear idea how expensive it is to create DSM per request, but my
intuition is that it's cheaper than processing the contexts and
generating the info.
I'd just remove that, unless someone demonstrates it really matters. I
don't really worry about how expensive it is to process a request
(within reason, of course) - it will happen only very rarely. It's more
important to make sure there's no overhead when no one asks the backend
for memory context info, and simplicity.
Also, how expensive it is to just keep the DSA "just in case"? Imagine
someone asks for the memory context info once - isn't it a was to still
keep the DSA? I don't recall how much resources could that be.
I don't have a clear opinion on that, I'm more asking for opinions.
8) Two minutes seems pretty arbitrary, and also quite high. If a timeout
is necessary, I think it should not be hard-coded.
regards
--
Tomas Vondra
Hi Tomas,
Thank you for the review.
1) I read through the thread, and in general I agree with the reasoning
for removing the file part - it seems perfectly fine to just dump as
much as we can fit into a buffer, and then summarize the rest. But do we
need to invent a "new" limit here? The other places logging memory
contexts do something like this:MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
Which means we only print the 100 memory contexts at the top, and that's
it. Wouldn't that give us a reasonable memory limit too?I think this prints more than 100 memory contexts, since 100 denotes the
max_level
and contexts at each level could have upto 100 children. This limit seems
much higher than
what I am currently storing in DSA which is approx. 7000 contexts. I will
verify this again.
2) I see the function got renamed to pg_get_process_memory_contexts(),
bu the comment still says pg_get_remote_backend_memory_contexts().Fixed
3) I don't see any SGML docs for this new function. I was a bit unsure
what the "summary" argument is meant to do. The comment does not explain
that either.Added docs.
Intention behind adding a summary argument is to report statistics of
contexts at level 0
and 1 i.e TopMemoryContext and its immediate children.
4) I wonder if the function needs to return PID. I mean, the caller
knows which PID it is for, so it seems rather unnecessary.
Perhaps it can be used to ascertain that the information indeed belongs to
the requested pid.
5) In the "summary" mode, it might be useful to include info about how
many child contexts were aggregated. It's useful to know whether there
was 1 child or 10000 children. In the regular (non-summary) mode it'd
always be "1", probably, but maybe it'd interact with the limit in (1).
Not sure.Sure, I will add this in the next iteration.
6) I feel a bit uneasy about the whole locking / communication scheme.
In particular, I'm worried about lockups, deadlocks, etc. So I decided
to do a trivial stress-test - just run the new function through pgbench
with many clients.The memstats.sql script does just this:
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
ORDER BY random() LIMIT 1)
, false);where the inner query just picks a PID for some other backend, and asks
for memory context stats for that.And just run it like this on a scale 1 pgbench database:
pgbench -n -f memstats.sql -c 10 test
And it gets stuck *immediately*. I've seen it to wait for other client
backends and auxiliary processes like autovacuum launcher.This is absolutely idle system, there's no reason why a process would
not respond almost immediately.
In my reproduction, this issue occurred because the process was terminated
while the requesting backend was waiting on the condition variable to be
signaled by it. I don’t see any solution other than having the waiting
client
backend timeout using ConditionVariableTimedSleep.
In the patch, since the timeout was set to a high value, pgbench ended up
stuck
waiting for the timeout to occur. The failure happens less frequently after
I added an
additional check for the process's existence, but it cannot be entirely
avoided. This is because a process can terminate after we check for its
existence but
before it signals the client. In such cases, the client will not receive
any signal.
I wonder if e.g. autovacuum launcher may
not be handling these requests, or what if client backends can wait in a
cycle.
Did not see a cyclic wait in client backends due to the pgbench stress test.
7) I've also seen this error:
pgbench: error: client 6 script 0 aborted in command 0 query 0: \
ERROR: can't attach the same segment more than once
I haven't investigated it, but it seems like a problem handling errors,
where we fail to detach from a segment after a timeout.
Thanks for the hint, fixed by adding a missing call to dsa_detach after
timeout.
I opted for DSAs over DSMs to enable memory reuse by freeing
segments for subsequent statistics copies of the same backend,
without needing to recreate DSMs for each request.I feel like this might be a premature optimization - I don't have a
clear idea how expensive it is to create DSM per request, but my
intuition is that it's cheaper than processing the contexts and
generating the info.I'd just remove that, unless someone demonstrates it really matters. I
don't really worry about how expensive it is to process a request
(within reason, of course) - it will happen only very rarely. It's more
important to make sure there's no overhead when no one asks the backend
for memory context info, and simplicity.Also, how expensive it is to just keep the DSA "just in case"? Imagine
someone asks for the memory context info once - isn't it a was to still
keep the DSA? I don't recall how much resources could that be.I don't have a clear opinion on that, I'm more asking for opinions.
Imagining a tool that periodically queries the backends for statistics,
it would be beneficial to avoid recreating the DSAs for each call.
Currently, DSAs of size 1MB per process
(i.e., a maximum of 1MB * (MaxBackends + auxiliary processes))
would be created and pinned for subsequent reporting. This size does
not seem excessively high, even for approx 100 backends and
auxiliary processes.
8) Two minutes seems pretty arbitrary, and also quite high. If a timeout
is necessary, I think it should not be hard-coded.Not sure which is the ideal value. Changed it to 15 secs and added a
#define as of now.
Something that gives enough time for the process to respond but
does not hold up the client for too long would be ideal. 15 secs seem to
be not enough for github CI tests, which fail with timeout error with this
setting.
PFA an updated patch with the above changes.
Attachments:
v6-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v6-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From c143854a77fa29c475e5103d386faef47e55194e Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
doc/src/sgml/func.sgml | 19 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 266 ++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 306 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 50 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
20 files changed, 699 insertions(+), 12 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8b81106fa2..73fe64e700 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28342,6 +28342,25 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ Requests to return the memory contexts of the backend with the
+ specified process ID. This function can send the request to
+ both the backends and auxiliary processes. After receiving the memory
+ contexts from the process, it returns the result as one row per
+ context. When get_summary is true, memory contexts at level 0
+ and level 1 are reported, along with cumulative results for the
+ remaining contexts.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..9caf8fa018 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 48350bec52..b3e6c2b5f0 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..8816ef6903 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 4b985bd056..ba8ef72fc3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3511,6 +3511,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..7a27b5f680 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..c13ef820cb 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,23 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -305,3 +300,252 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a dynamic shared memory space. The statistics for contexts that do not fit in
+ * shared memory area are stored as a cumulative total of those contexts,
+ * at the end in the dynamic shared memory.
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory.
+ * Once condition variable comes out of sleep, check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ return (Datum) 0;
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_get_backend_memory_contexts instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Return statistics of top level 1 and 2 contexts, if get_summary is
+ * true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ return (Datum) 0;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ dsa_detach(area);
+ return (Datum) 0;
+ }
+#define MEMSTATS_WAIT_TIMEOUT 15000
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(WARNING,
+ (errmsg("Wait for %d process to publish stats timed out, try again", pid)));
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ dsa_detach(area);
+ return (Datum) 0;
+ }
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ /* DSA free allocation for this client */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 70d33226cb..94078324f9 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -166,6 +172,7 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident);
/*
* You should not do memory allocations within a critical section, because
@@ -1276,6 +1283,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1335,288 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which
+ * can hold statistics of approx 6700 contexts. Remaining
+ * contexts statistics is captured as a cumulative total.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_process_memory_context is view, similar to its local backend
+ * couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ stats_count = stats_count + 1;
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+
+ if (!get_summary)
+ continue;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ if (list_length(path) == 3)
+ {
+ stats_count = stats_count - 1;
+ break;
+ }
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto num_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = stats_count > num_stats ? num_stats : stats_count;
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = context_id;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 2))
+ {
+ /* Copy statistics to DSM memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL));
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[num_stats - 1].totalspace += stat.totalspace;
+ meminfo[num_stats - 1].nblocks += stat.nblocks;
+ meminfo[num_stats - 1].freespace += stat.freespace;
+ meminfo[num_stats - 1].freechunks += stat.freechunks;
+ }
+ /* Display information upto level 2 for summary */
+ if (get_summary && list_length(path) == 3)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ break;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (num_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (num_stats - 2) && !get_summary)
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ memCtxState[idx].total_stats = context_id;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident)
+{
+ MemoryContextCounters stat;
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*context->methods->stats) (context, NULL, NULL, &stat, true);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..b205c54710 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8436,6 +8436,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int4,int4,int4,int4,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 42a2b38cac..7184727cf1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index bf93433b78..d1d3b9da93 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,49 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..f864c75dbe 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..a56cc44eea 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.34.1
On 11/29/24 00:23, Rahila Syed wrote:
Hi Tomas,
Thank you for the review.
1) I read through the thread, and in general I agree with the reasoning
for removing the file part - it seems perfectly fine to just dump as
much as we can fit into a buffer, and then summarize the rest. But do we
need to invent a "new" limit here? The other places logging memory
contexts do something like this:MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
Which means we only print the 100 memory contexts at the top, and that's
it. Wouldn't that give us a reasonable memory limit too?I think this prints more than 100 memory contexts, since 100 denotes the
max_level
and contexts at each level could have upto 100 children. This limit
seems much higher than
what I am currently storing in DSA which is approx. 7000 contexts. I
will verify this again.
Yeah, you may be right. I don't remember what exactly that limit does.
2) I see the function got renamed to pg_get_process_memory_contexts(),
bu the comment still says pg_get_remote_backend_memory_contexts().Fixed
3) I don't see any SGML docs for this new function. I was a bit unsure
what the "summary" argument is meant to do. The comment does not explain
that either.Added docs.
Intention behind adding a summary argument is to report statistics of
contexts at level 0
and 1 i.e TopMemoryContext and its immediate children.
OK
4) I wonder if the function needs to return PID. I mean, the caller
knows which PID it is for, so it seems rather unnecessary.Perhaps it can be used to ascertain that the information indeed belongs to
the requested pid.
I find that a bit ... suspicious. By this logic we'd include the input
parameters in every result, but we don't. So why is this case different?
5) In the "summary" mode, it might be useful to include info about how
many child contexts were aggregated. It's useful to know whether there
was 1 child or 10000 children. In the regular (non-summary) mode it'd
always be "1", probably, but maybe it'd interact with the limit in (1).
Not sure.Sure, I will add this in the next iteration.
OK
6) I feel a bit uneasy about the whole locking / communication scheme.
In particular, I'm worried about lockups, deadlocks, etc. So I decided
to do a trivial stress-test - just run the new function through pgbench
with many clients.The memstats.sql script does just this:
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
ORDER BY random() LIMIT 1)
, false);where the inner query just picks a PID for some other backend, and asks
for memory context stats for that.And just run it like this on a scale 1 pgbench database:
pgbench -n -f memstats.sql -c 10 test
And it gets stuck *immediately*. I've seen it to wait for other client
backends and auxiliary processes like autovacuum launcher.This is absolutely idle system, there's no reason why a process would
not respond almost immediately.
In my reproduction, this issue occurred because the process was terminated
while the requesting backend was waiting on the condition variable to be
signaled by it. I don’t see any solution other than having the waiting
client
backend timeout using ConditionVariableTimedSleep.In the patch, since the timeout was set to a high value, pgbench ended
up stuck
waiting for the timeout to occur. The failure happens less frequently
after I added an
additional check for the process's existence, but it cannot be entirely
avoided. This is because a process can terminate after we check for its
existence but
before it signals the client. In such cases, the client will not receive
any signal.
Hmmm, I see. I guess there's no way to know if a process responds to us,
but I guess it should be possible to wake up regularly and check if the
process still exists? Wouldn't that solve the case you mentioned?
I wonder if e.g. autovacuum launcher may
not be handling these requests, or what if client backends can wait in a
cycle.
Did not see a cyclic wait in client backends due to the pgbench stress test.
Not sure, but if I modify the query to only request memory contexts from
non-client processes, i.e.
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
AND backend_type != 'client backend'
ORDER BY random() LIMIT 1)
, false);
then it gets stuck and reports this:
pgbench -n -f select.sql -c 4 -T 10 test
pgbench (18devel)
WARNING: Wait for 105029 process to publish stats timed out, ...
But process 105029 still very much exists, and it's the checkpointer:
$ ps ax | grep 105029
105029 ? Ss 0:00 postgres: checkpointer
OTOH if I modify the script to only look at client backends, and wait
until the processes get "stuck" (i.e. waiting on the condition variable,
consuming 0% CPU), I get this:
$ pgbench -n -f select.sql -c 4 -T 10 test
pgbench (18devel)
WARNING: Wait for 107146 process to publish stats timed out, try again
WARNING: Wait for 107144 process to publish stats timed out, try again
WARNING: Wait for 107147 process to publish stats timed out, try again
transaction type: select.sql
...
but when it gets 'stuck', most of the processes are still very much
running (but waiting for contexts from some other process). In the above
example I see this:
107144 ? Ss 0:02 postgres: user test [local] SELECT
107145 ? Ss 0:01 postgres: user test [local] SELECT
107147 ? Ss 0:02 postgres: user test [local] SELECT
So yes, 107146 seems to be gone. But why would that block getting info
from 107144 and 107147?
Maybe that's acceptable, but couldn't this be an issue with short-lived
connections, making it hard to implement the kind of automated
collection of stats that you envision. If it hits this kind of timeouts
often, it'll make it hard to reliably collect info. No?
> I opted for DSAs over DSMs to enable memory reuse by freeing
> segments for subsequent statistics copies of the same backend,
> without needing to recreate DSMs for each request.I feel like this might be a premature optimization - I don't have a
clear idea how expensive it is to create DSM per request, but my
intuition is that it's cheaper than processing the contexts and
generating the info.I'd just remove that, unless someone demonstrates it really matters. I
don't really worry about how expensive it is to process a request
(within reason, of course) - it will happen only very rarely. It's more
important to make sure there's no overhead when no one asks the backend
for memory context info, and simplicity.Also, how expensive it is to just keep the DSA "just in case"? Imagine
someone asks for the memory context info once - isn't it a was to still
keep the DSA? I don't recall how much resources could that be.I don't have a clear opinion on that, I'm more asking for opinions.
Imagining a tool that periodically queries the backends for statistics,
it would be beneficial to avoid recreating the DSAs for each call.
I think it would be nice if you backed this with some numbers. I mean,
how expensive is it to create/destroy the DSA? How does it compare to
the other stuff this function needs to do?
Currently, DSAs of size 1MB per process
(i.e., a maximum of 1MB * (MaxBackends + auxiliary processes))
would be created and pinned for subsequent reporting. This size does
not seem excessively high, even for approx 100 backends and
auxiliary processes.
That seems like a pretty substantial amount of memory reserved for each
connection. IMHO the benefits would have to be pretty significant to
justify this, especially considering it's kept "forever", even if you
run the function only once per day.
8) Two minutes seems pretty arbitrary, and also quite high. If a timeout
is necessary, I think it should not be hard-coded.Not sure which is the ideal value. Changed it to 15 secs and added a
#define as of now.
Something that gives enough time for the process to respond but
does not hold up the client for too long would be ideal. 15 secs seem to
be not enough for github CI tests, which fail with timeout error with
this setting.PFA an updated patch with the above changes.
Why not to make this a parameter of the function? With some sensible
default, but easy to override.
regards
--
Tomas Vondra
Hi,
4) I wonder if the function needs to return PID. I mean, the caller
knows which PID it is for, so it seems rather unnecessary.Perhaps it can be used to ascertain that the information indeed belongs
to
the requested pid.
I find that a bit ... suspicious. By this logic we'd include the input
parameters in every result, but we don't. So why is this case different?
This was added to address a review suggestion. I had left it in case anyone
found it useful
for verification.
Previously, I included a check for scenarios where multiple processes could
write to the same
shared memory. Now, each process has a separate shared memory space
identified by
pgprocno, making it highly unlikely for the receiving process to see
another process's memory
dump.
Such a situation could theoretically occur if another process were mapped
to the same
pgprocno, although I’m not sure how likely that is. That said, I’ve added a
check in the receiver
to ensure the PID written in the shared memory matches the PID for which
the dump is
requested.
This guarantees that a user will never see the memory dump of another
process.
Given this, I’m fine with removing the pid column if it helps to make the
output more readable.
5) In the "summary" mode, it might be useful to include info about how
many child contexts were aggregated. It's useful to know whether
there
was 1 child or 10000 children. In the regular (non-summary) mode it'd
always be "1", probably, but maybe it'd interact with the limit in(1).
Not sure.
Sure, I will add this in the next iteration.
OK
I have added this information as a column named "num_agg_contexts", which
indicates
the number of contexts whose statistics have been aggregated/added for a
particular output.
In summary mode, all the child contexts of a given level-1 context are
aggregated, and
their statistics are presented as part of the parent context's statistics.
In this case,
num_agg_contexts provides the count of all child contexts under a given
level-1 context.
In regular (non-summary) mode, this column shows a value of 1, meaning the
statistics
correspond to a single context, with all context statistics displayed
individually. In this mode
an aggregate result is displayed if the number of contexts exceed the DSA
size limit. In
this case the num_agg_contexts will display the number of the remaining
contexts.
In the patch, since the timeout was set to a high value, pgbench ended
up stuck
waiting for the timeout to occur. The failure happens less frequently
after I added an
additional check for the process's existence, but it cannot be entirely
avoided. This is because a process can terminate after we check for its
existence but
before it signals the client. In such cases, the client will not receive
any signal.Hmmm, I see. I guess there's no way to know if a process responds to us,
but I guess it should be possible to wake up regularly and check if the
process still exists? Wouldn't that solve the case you mentioned?I have fixed it accordingly in the attached patch by waking up after every
5 seconds
to check if the process exists and sleeping again if the wake-up condition
is not satisfied. The number of such tries is limited to 20. So, the total
wait
time can be 100 seconds. I will make the re-tries configurable, inline with
your
suggestion to be able to override the default waiting time.
I wonder if e.g. autovacuum launcher may
not be handling these requests, or what if client backends can waitin a
cycle.
Did not see a cyclic wait in client backends due to the pgbench stress
test.
Not sure, but if I modify the query to only request memory contexts from
non-client processes, i.e.SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
AND backend_type != 'client backend'
ORDER BY random() LIMIT 1)
, false);then it gets stuck and reports this:
pgbench -n -f select.sql -c 4 -T 10 test
pgbench (18devel)
WARNING: Wait for 105029 process to publish stats timed out, ...But process 105029 still very much exists, and it's the checkpointer:
In the case of checkpointer, I also see some wait time after running the
tests that you mentioned, but it eventually completes the request in my
runs.
$ ps ax | grep 105029
105029 ? Ss 0:00 postgres: checkpointerOTOH if I modify the script to only look at client backends, and wait
until the processes get "stuck" (i.e. waiting on the condition variable,
consuming 0% CPU), I get this:$ pgbench -n -f select.sql -c 4 -T 10 test
pgbench (18devel)
WARNING: Wait for 107146 process to publish stats timed out, try again
WARNING: Wait for 107144 process to publish stats timed out, try again
WARNING: Wait for 107147 process to publish stats timed out, try again
transaction type: select.sql
...but when it gets 'stuck', most of the processes are still very much
running (but waiting for contexts from some other process). In the above
example I see this:107144 ? Ss 0:02 postgres: user test [local] SELECT
107145 ? Ss 0:01 postgres: user test [local] SELECT
107147 ? Ss 0:02 postgres: user test [local] SELECTSo yes, 107146 seems to be gone. But why would that block getting info
from 107144 and 107147?Most likely 107144 and/or 107147 must also be waiting for 107146 which is
gone. Something like 107144 -> 107147 -> 107146(dead) or 107144
->107146(dead)
and 107147->107146(dead).
Maybe that's acceptable, but couldn't this be an issue with short-lived
connections, making it hard to implement the kind of automated
collection of stats that you envision. If it hits this kind of timeouts
often, it'll make it hard to reliably collect info. No?
Yes, if there is a chain of waiting clients due to a process no longer
existing,
the waiting time to receive information will increase. However, as long as
a failed
a request caused by a non-existent process is detected promptly, the wait
time should
remain manageable, allowing other waiting clients to obtain the requested
information
from the existing processes.
In such cases, it might be necessary to experiment with the waiting times
at the receiving
client. Making the waiting time user-configurable, as you suggested, by
passing it as an
argument to the function, could help address this scenario.
Thanks for highlighting this, I will test this some more.
I opted for DSAs over DSMs to enable memory reuse by freeing
segments for subsequent statistics copies of the same backend,
without needing to recreate DSMs for each request.I feel like this might be a premature optimization - I don't have a
clear idea how expensive it is to create DSM per request, but my
intuition is that it's cheaper than processing the contexts and
generating the info.I'd just remove that, unless someone demonstrates it really matters.
I
don't really worry about how expensive it is to process a request
(within reason, of course) - it will happen only very rarely. It'smore
important to make sure there's no overhead when no one asks the
backend
for memory context info, and simplicity.
Also, how expensive it is to just keep the DSA "just in case"?
Imagine
someone asks for the memory context info once - isn't it a was to
still
keep the DSA? I don't recall how much resources could that be.
I don't have a clear opinion on that, I'm more asking for opinions.
Imagining a tool that periodically queries the backends for statistics,
it would be beneficial to avoid recreating the DSAs for each call.I think it would be nice if you backed this with some numbers. I mean,
how expensive is it to create/destroy the DSA? How does it compare to
the other stuff this function needs to do?After instrumenting the code with timestamps, I observed that DSA creation
accounts for approximately 17% to 26% of the total execution time of the
function
pg_get_process_memory_contexts().
Currently, DSAs of size 1MB per process
(i.e., a maximum of 1MB * (MaxBackends + auxiliary processes))
would be created and pinned for subsequent reporting. This size does
not seem excessively high, even for approx 100 backends and
auxiliary processes.That seems like a pretty substantial amount of memory reserved for each
connection. IMHO the benefits would have to be pretty significant to
justify this, especially considering it's kept "forever", even if you
run the function only once per day.I can reduce the initial segment size to DSA_MIN_SEGMENT_SIZE, which is
256KB per process. If needed, this could grow up to 16MB based on the
current settings.
However, for the scenario you mentioned, it would be ideal to have a
mechanism
to mark a pinned DSA (using dsa_pin()) for deletion if it is not
used/attached within a
specified duration. Alternatively, I could avoid using dsa_pin()
altogether, allowing the
DSA to be automatically destroyed once all processes detach from it, and
recreate it
for a new request.
At the moment, I am unsure which approach is most feasible. Any suggestions
would be
greatly appreciated.
Thank you,
Rahila Syed
Attachments:
v7-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v7-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From 9f882b110ef36fdc13716c9267130bcda7f453ec Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
doc/src/sgml/func.sgml | 19 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 275 +++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 412 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 51 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
20 files changed, 804 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8b81106fa2..73fe64e700 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28342,6 +28342,25 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ Requests to return the memory contexts of the backend with the
+ specified process ID. This function can send the request to
+ both the backends and auxiliary processes. After receiving the memory
+ contexts from the process, it returns the result as one row per
+ context. When get_summary is true, memory contexts at level 0
+ and level 1 are reported, along with cumulative results for the
+ remaining contexts.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..5d01497ada 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..9caf8fa018 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 48350bec52..b3e6c2b5f0 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..8816ef6903 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 4b985bd056..ba8ef72fc3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3511,6 +3511,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..7a27b5f680 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..43a4da264c 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,23 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -71,7 +66,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -305,3 +300,259 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a dynamic shared memory space. The statistics for contexts that do not fit in
+ * shared memory area are stored as a cumulative total of those contexts,
+ * at the end in the dynamic shared memory.
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory.
+ * Once condition variable comes out of sleep, check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+ int num_retries = 0;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ return (Datum) 0;
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_get_backend_memory_contexts instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Return statistics of top level 1 and 2 contexts, if get_summary is
+ * true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ return (Datum) 0;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ while (1)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ dsa_detach(area);
+ return (Datum) 0;
+ }
+#define MEMSTATS_WAIT_TIMEOUT 5000
+#define MAX_RETRIES 20
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again", pid)));
+ if (num_retries > MAX_RETRIES)
+ {
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ dsa_detach(area);
+ return (Datum) 0;
+ }
+ num_retries = num_retries + 1;
+ }
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+ values[10] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memCtxState[procNumber].proc_id);
+ values[10] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ /* DSA free allocation for this client */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[procNumber].memstats_dsa_pointer);
+ memCtxState[procNumber].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+ return (Datum) 0;
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 70d33226cb..fbb1a10243 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -135,6 +141,12 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+typedef enum PrintDetails
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDetails;
/*
* CurrentMemoryContext
@@ -162,10 +174,11 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDetails print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +844,18 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDetails print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +896,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDetails print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+ /* Do not print the statistics */
+
+ /*
+ * print_to_stderr is a no-op if no statistics are going to be printed i.e
+ * print_location == PRINT_STATS_NONE
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +952,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +970,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +985,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1322,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1374,335 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which
+ * can hold statistics of approx 6700 contexts. Remaining
+ * contexts statistics is captured as a cumulative total.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_process_memory_context is view, similar to its local backend
+ * couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = stats_count;
+
+ stats_count = stats_count + 1;
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ entry->context_id = stats_count;
+ stats_count = stats_count + 1;
+ }
+ contexts = lappend(contexts, c);
+ }
+ /* In summary only the first level contexts are displayed */
+ if (get_summary)
+ break;
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto num_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = stats_count > num_stats ? num_stats : stats_count;
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ path = lcons_int(0, path);
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL, &stat, true);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, NULL, stat, 1);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(capped at
+ * 100?). This includes statistics of all of their children upto level
+ * 100
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals, PRINT_STATS_NONE, &num_contexts);
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors.
+ */
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (c->ident != NULL)
+ {
+ int idlen = strlen(c->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(c->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, c->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ PublishMemoryContext(meminfo, ctx_id, c, path, (c->ident != NULL ? clipped_ident : NULL), grand_totals, num_contexts);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].in_memory_stats = stats_count;
+ memCtxState[idx].total_stats = stats_count;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 2))
+ {
+ /* Copy statistics to DSM memory */
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL), stat, 1);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[num_stats - 1].totalspace += stat.totalspace;
+ meminfo[num_stats - 1].nblocks += stat.nblocks;
+ meminfo[num_stats - 1].freespace += stat.freespace;
+ meminfo[num_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (num_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (num_stats - 2))
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[num_stats - 1].num_contexts = context_id - memCtxState[idx].in_memory_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal the waiting client backend after setting the exit condition flag
+ */
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts)
+{
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_contexts = num_contexts;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9575524007..dd01cba4a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8436,6 +8436,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified backend
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, pid, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 42a2b38cac..7184727cf1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index bf93433b78..a58a6b824f 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,50 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_contexts;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..f864c75dbe 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..a56cc44eea 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.34.1
On 12/3/24 20:09, Rahila Syed wrote:
Hi,
4) I wonder if the function needs to return PID. I mean, the
caller
knows which PID it is for, so it seems rather unnecessary.
Perhaps it can be used to ascertain that the information indeed
belongs to
the requested pid.
I find that a bit ... suspicious. By this logic we'd include the input
parameters in every result, but we don't. So why is this case different?
This was added to address a review suggestion. I had left it in case
anyone found it useful
for verification.
Previously, I included a check for scenarios where multiple processes
could write to the same
shared memory. Now, each process has a separate shared memory space
identified by
pgprocno, making it highly unlikely for the receiving process to see
another process's memory
dump.
Such a situation could theoretically occur if another process were
mapped to the same
pgprocno, although I’m not sure how likely that is. That said, I’ve
added a check in the receiver
to ensure the PID written in the shared memory matches the PID for which
the dump is
requested.
This guarantees that a user will never see the memory dump of another
process.
Given this, I’m fine with removing the pid column if it helps to make
the output more readable.
I'd just remove that. I agree it might have been useful with the single
chunk of shared memory, but I think with separate chunks it's not very
useful. And if we can end up with multiple processed getting the same
pgprocno I guess we have way bigger problems, this won't fix that.
5) In the "summary" mode, it might be useful to include info
about how
many child contexts were aggregated. It's useful to know
whether there
was 1 child or 10000 children. In the regular (non-summary)
mode it'd
always be "1", probably, but maybe it'd interact with the
limit in (1).
Not sure.
Sure, I will add this in the next iteration.
OK
I have added this information as a column named "num_agg_contexts",
which indicates
the number of contexts whose statistics have been aggregated/added for a
particular output.In summary mode, all the child contexts of a given level-1 context are
aggregated, and
their statistics are presented as part of the parent context's
statistics. In this case,
num_agg_contexts provides the count of all child contexts under a given
level-1 context.In regular (non-summary) mode, this column shows a value of 1, meaning
the statistics
correspond to a single context, with all context statistics displayed
individually. In this mode
an aggregate result is displayed if the number of contexts exceed the
DSA size limit. In
this case the num_agg_contexts will display the number of the remaining
contexts.
OK
In the patch, since the timeout was set to a high value, pgbench ended
up stuck
waiting for the timeout to occur. The failure happens less frequently
after I added an
additional check for the process's existence, but it cannot beentirely
avoided. This is because a process can terminate after we check
for its
existence but
before it signals the client. In such cases, the client will notreceive
any signal.
Hmmm, I see. I guess there's no way to know if a process responds to us,
but I guess it should be possible to wake up regularly and check if the
process still exists? Wouldn't that solve the case you mentioned?I have fixed it accordingly in the attached patch by waking up after
every 5 seconds
to check if the process exists and sleeping again if the wake-up condition
is not satisfied. The number of such tries is limited to 20. So, the
total wait
time can be 100 seconds. I will make the re-tries configurable, inline
with your
suggestion to be able to override the default waiting time.
Makes sense, although 100 seconds seems a bit weird, it seems we usually
pick "natural" values like 60s, or multiples of that. But if it's
configurable, that's not a huge issue.
Could the process wake up earlier than the timeout, say if it gets EINT
signal? That'd break the "total timeout is 100 seconds", and it would be
better to check that explicitly. Not sure if this can happen, though.
One thing I'd maybe consider is starting with a short timeout, and
gradually increasing it until e.g. 5 seconds (or maybe just 1 second
would be perfectly fine, IMHO). With the current coding it means we
either get the response right away, or wait 5+ seconds. That's a big
huge jump. If we start e.g. with 10ms, and then gradually multiply it by
1.2, it means we only wait "0-20% extra" on average.
But perhaps this is very unlikely and not worth the complexity.
I wonder if e.g. autovacuum launcher may
not be handling these requests, or what if client backends canwait in a
cycle.
Did not see a cyclic wait in client backends due to the pgbenchstress test.
Not sure, but if I modify the query to only request memory contexts from
non-client processes, i.e.SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
AND backend_type != 'client backend'
ORDER BY random() LIMIT 1)
, false);then it gets stuck and reports this:
pgbench -n -f select.sql -c 4 -T 10 test
pgbench (18devel)
WARNING: Wait for 105029 process to publish stats timed out, ...But process 105029 still very much exists, and it's the checkpointer:
In the case of checkpointer, I also see some wait time after running the
tests that you mentioned, but it eventually completes the request in my
runs.
OK, but why should it even wait that long? Surely the checkpointer
should be able to report memory contexts too?
$ ps ax | grep 105029
105029 ? Ss 0:00 postgres: checkpointerOTOH if I modify the script to only look at client backends, and wait
until the processes get "stuck" (i.e. waiting on the condition variable,
consuming 0% CPU), I get this:$ pgbench -n -f select.sql -c 4 -T 10 test
pgbench (18devel)
WARNING: Wait for 107146 process to publish stats timed out, try again
WARNING: Wait for 107144 process to publish stats timed out, try again
WARNING: Wait for 107147 process to publish stats timed out, try again
transaction type: select.sql
...but when it gets 'stuck', most of the processes are still very much
running (but waiting for contexts from some other process). In the above
example I see this:107144 ? Ss 0:02 postgres: user test [local] SELECT
107145 ? Ss 0:01 postgres: user test [local] SELECT
107147 ? Ss 0:02 postgres: user test [local] SELECTSo yes, 107146 seems to be gone. But why would that block getting info
from 107144 and 107147?Most likely 107144 and/or 107147 must also be waiting for 107146 which is
gone. Something like 107144 -> 107147 -> 107146(dead) or 107144 -107146(dead)
and 107147->107146(dead).
I think I forgot to mention only 107145 was waiting for 107146 (dead),
and the other processes were waiting for 107145 in some way. But yeah,
detecting the dead process would improve this, although it also shows
the issues can "spread" easily.
OTOH it's unlikely to have multiple pg_get_process_memory_contexts()
queries pointing at each other like this - monitoring will just to that
from one backend, and that's it. So not a huge issue.
Maybe that's acceptable, but couldn't this be an issue with short-lived
connections, making it hard to implement the kind of automated
collection of stats that you envision. If it hits this kind of timeouts
often, it'll make it hard to reliably collect info. No?Yes, if there is a chain of waiting clients due to a process no longer
existing,
the waiting time to receive information will increase. However, as long
as a failed
a request caused by a non-existent process is detected promptly, the
wait time should
remain manageable, allowing other waiting clients to obtain the
requested information
from the existing processes.In such cases, it might be necessary to experiment with the waiting
times at the receiving
client. Making the waiting time user-configurable, as you suggested, by
passing it as an
argument to the function, could help address this scenario.
Thanks for highlighting this, I will test this some more.
I think we should try very hard to make this work well without the user
having to mess with the timeouts. These are exceptional conditions that
happen only very rarely, which makes it hard to find good values.
> I opted for DSAs over DSMs to enable memory reuse by freeing
> segments for subsequent statistics copies of the same backend,
> without needing to recreate DSMs for each request.I feel like this might be a premature optimization - I don't
have a
clear idea how expensive it is to create DSM per request, but my
intuition is that it's cheaper than processing the contexts and
generating the info.I'd just remove that, unless someone demonstrates it really
matters. I
don't really worry about how expensive it is to process a request
(within reason, of course) - it will happen only very rarely.It's more
important to make sure there's no overhead when no one asks
the backend
for memory context info, and simplicity.
Also, how expensive it is to just keep the DSA "just in case"?
Imagine
someone asks for the memory context info once - isn't it a was
to still
keep the DSA? I don't recall how much resources could that be.
I don't have a clear opinion on that, I'm more asking for
opinions.
Imagining a tool that periodically queries the backends forstatistics,
it would be beneficial to avoid recreating the DSAs for each call.
I think it would be nice if you backed this with some numbers. I mean,
how expensive is it to create/destroy the DSA? How does it compare to
the other stuff this function needs to do?After instrumenting the code with timestamps, I observed that DSA creation
accounts for approximately 17% to 26% of the total execution time of the
function
pg_get_process_memory_contexts().Currently, DSAs of size 1MB per process
(i.e., a maximum of 1MB * (MaxBackends + auxiliary processes))
would be created and pinned for subsequent reporting. This size does
not seem excessively high, even for approx 100 backends and
auxiliary processes.That seems like a pretty substantial amount of memory reserved for each
connection. IMHO the benefits would have to be pretty significant to
justify this, especially considering it's kept "forever", even if you
run the function only once per day.I can reduce the initial segment size to DSA_MIN_SEGMENT_SIZE, which is
256KB per process. If needed, this could grow up to 16MB based on the
current settings.However, for the scenario you mentioned, it would be ideal to have a
mechanism
to mark a pinned DSA (using dsa_pin()) for deletion if it is not used/
attached within a
specified duration. Alternatively, I could avoid using dsa_pin()
altogether, allowing the
DSA to be automatically destroyed once all processes detach from it, and
recreate it
for a new request.At the moment, I am unsure which approach is most feasible. Any
suggestions would be
greatly appreciated.
I'm entirely unconcerned about the pg_get_process_memory_contexts()
performance, within some reasonable limits. It's something executed
every now and then - no one is going to complain it takes 10ms extra,
measure tps with this function, etc.
17-26% seems surprisingly high, but Even 256kB is too much, IMHO. I'd
just get rid of this optimization until someone complains and explains
why it's worth it.
Yes, let's make it fast, but I don't think we should optimize it at the
expense of "regular workload" ...
regards
--
Tomas Vondra
Hi Rahila,
Thanks for working on this. I've wanted something like this a number
of times to replace my current method of attaching gdb like everyone
else I suppose.
I have a question / suggestion about the interface.
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
IIUC, this always returns all memory contexts starting from
TopMemoryContext, summarizing some child contexts if memory doesn't
suffice. Would it be helpful to allow users to specify a context other
than TopMemoryContext as the root? This could be particularly useful
in cases where the information a user is looking for would otherwise
be grouped under "Remaining Totals." Alternatively, is there a way to
achieve this with the current function, perhaps by specifying a
condition in the WHERE clause?
Hi,
Thanks for updating the patch and here are some comments:
'path' column of pg_get_process_memory_contexts() begins with 0, but
that column of pg_backend_memory_contexts view begins with 1:
=# select path FROM pg_get_process_memory_contexts('20271', false);
path
-------
{0}
{0,1}
{0,2}
..
=# select path from pg_backend_memory_contexts;
path
-------
{1}
{1,2}
{1,3}
..asdf asdf
Would it be better to begin with 1 to make them consistent?
pg_log_backend_memory_contexts() does not allow non-superusers to
execute by default since it can peek at other session information.
pg_get_process_memory_contexts() does not have this restriction, but
wouldn't it be necessary?
When the target pid is the local backend, the HINT suggests using
pg_get_backend_memory_contexts(), but this function is not described in
the manual.
How about suggesting pg_backend_memory_contexts view instead?
=# select pg_get_process_memory_contexts('27041', false);
WARNING: cannot return statistics for local backend
HINT: Use pg_get_backend_memory_contexts instead
There are no explanations about 'num_agg_contexts', but I thought the
explanation like below would be useful.
I have added this information as a column named "num_agg_contexts",
which indicates
the number of contexts whose statistics have been aggregated/added for
a particular output.
git apply caused some warnings:
$ git apply
v7-Function-to-report-memory-context-stats-of-any-backe.patch
v7-Function-to-report-memory-context-stats-of-any-backe.patch:71: space
before tab in indent.
Requests to return the memory contexts of the backend with the
v7-Function-to-report-memory-context-stats-of-any-backe.patch:72: space
before tab in indent.
specified process ID. This function can send the request to
v7-Function-to-report-memory-context-stats-of-any-backe.patch:ldmv:
space before tab in indent.
both the backends and auxiliary processes. After receiving the
memory
v7-Function-to-report-memory-context-stats-of-any-backe.patch:74: space
before tab in indent.
contexts from the process, it returns the result as one row per
v7-Function-to-report-memory-context-stats-of-any-backe.patch:75: space
before tab in indent.
context. When get_summary is true, memory contexts at level 0
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Hi Tomas,
I'd just remove that. I agree it might have been useful with the single
chunk of shared memory, but I think with separate chunks it's not very
useful. And if we can end up with multiple processed getting the same
pgprocno I guess we have way bigger problems, this won't fix that.
OK, fixed accordingly in the attached patch.
In the patch, since the timeout was set to a high value, pgbench
ended
up stuck
waiting for the timeout to occur. The failure happens lessfrequently
after I added an
additional check for the process's existence, but it cannot beentirely
avoided. This is because a process can terminate after we check
for its
existence but
before it signals the client. In such cases, the client will notreceive
any signal.
Hmmm, I see. I guess there's no way to know if a process responds to
us,
but I guess it should be possible to wake up regularly and check if
the
process still exists? Wouldn't that solve the case you mentioned?
I have fixed it accordingly in the attached patch by waking up after
every 5 seconds
to check if the process exists and sleeping again if the wake-upcondition
is not satisfied. The number of such tries is limited to 20. So, the
total wait
time can be 100 seconds. I will make the re-tries configurable, inline
with your
suggestion to be able to override the default waiting time.Makes sense, although 100 seconds seems a bit weird, it seems we usually
pick "natural" values like 60s, or multiples of that. But if it's
configurable, that's not a huge issue.Could the process wake up earlier than the timeout, say if it gets EINT
signal? That'd break the "total timeout is 100 seconds", and it would be
better to check that explicitly. Not sure if this can happen, though.Not sure, I will check again. According to the comment on WaitLatch, a
process
waiting on it should only wake up when timeout happens or SetLatch is
called.
One thing I'd maybe consider is starting with a short timeout, and
gradually increasing it until e.g. 5 seconds (or maybe just 1 second
would be perfectly fine, IMHO). With the current coding it means we
either get the response right away, or wait 5+ seconds. That's a big
huge jump. If we start e.g. with 10ms, and then gradually multiply it by
1.2, it means we only wait "0-20% extra" on average.But perhaps this is very unlikely and not worth the complexity.
OK, Currently I have changed it to always wait for signal from backend or
timeout
before checking the exit condition. This is to ensure that a backend gets
a chance to publish the new statistics since I am retaining the old
statistics
due to reasons explained below. I will experiment with setting a shorter
timeout
and gradually increasing it.
In the case of checkpointer, I also see some wait time after running the
tests that you mentioned, but it eventually completes the request in my
runs.OK, but why should it even wait that long? Surely the checkpointer
should be able to report memory contexts too?
The checkpointer responds to requests promptly when the requests are
sequential.
However, a timeout may occur if concurrent requests for memory statistics
are
sent to the checkpointer.
In this case, one client sends a GetMemoryContext signal to the
checkpointer.
The checkpointer sets the PublishMemoryContextPending flag to true in the
handler for this signal. This flag remains true until a
CHECK_FOR_INTERRUPTS
is called, which processes the interrupt and clears the flag.
If another process concurrently sends a GetMemoryContext signal to the
checkpointer
before the CHECK_FOR_INTERRUPTS is called for the previous signal, the
PublishMemoryContextPending flag will already be set to true. When the
CHECK_FOR_INTERRUPTS is eventually called by the checkpointer, it processes
both
requests and dumps its memory context statistics.
However, only one of the two waiting clients gets to read the statistics.
This is because
the first client that gains access to the shared statistics reads the data
and frees the
DSA memory after it is done. As a result, the second client keeps waiting
until it times out,
since the checkpointer has already processed its request and sent the
statistics
which the second client never gets to read.
I believe that retaining the DSAs with the latest statistics after each
request would
help resolve the issue of request timeouts in scenarios with concurrent
requests.
I have included this in the attached patch.
and the other processes were waiting for 107145 in some way. But yeah,
detecting the dead process would improve this, although it also shows
the issues can "spread" easily.OTOH it's unlikely to have multiple pg_get_process_memory_contexts()
queries pointing at each other like this - monitoring will just to that
from one backend, and that's it. So not a huge issue.Makes sense.
.
In such cases, it might be necessary to experiment with the waiting
times at the receiving
client. Making the waiting time user-configurable, as you suggested, by
passing it as an
argument to the function, could help address this scenario.
Thanks for highlighting this, I will test this some more.I think we should try very hard to make this work well without the user
having to mess with the timeouts. These are exceptional conditions that
happen only very rarely, which makes it hard to find good values.OK.
I'm entirely unconcerned about the pg_get_process_memory_contexts()
performance, within some reasonable limits. It's something executed
every now and then - no one is going to complain it takes 10ms extra,
measure tps with this function, etc.17-26% seems surprisingly high, but Even 256kB is too much, IMHO. I'd
just get rid of this optimization until someone complains and explains
why it's worth it.Yes, let's make it fast, but I don't think we should optimize it at the
expense of "regular workload" ...
After debugging the concurrent requests timeout issue, it appears there is
yet another
argument in favor of avoiding the recreation of DSAs for every request: we
get to retain
the last reported statistics for a given postgres process, which can help
prevent certain
requests to fail in case of concurrent requests to the same process.
Thank you,
Rahila Syed
Attachments:
v8-0001-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v8-0001-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From d0388a3a2eb8f882f7ae3b33c8dc0ec300520857 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
doc/src/sgml/func.sgml | 19 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 263 ++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 422 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 51 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
20 files changed, 802 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581a..d6a3799d7c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28358,6 +28358,25 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ Requests to display the memory contexts of a postgres process
+ with the specified process ID. This function can send the request to
+ both the backends and auxiliary processes. After receiving the memory
+ contexts from the process, it returns the result as one row per
+ context. When get_summary is true, memory contexts at level 0
+ and level 1 are reported, along with cumulative results for the
+ remaining contexts.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 8078eeef62..b2670ca4fe 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..9caf8fa018 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..1107ff6d45 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..467a253ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..17beb8737d 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 48350bec52..b3e6c2b5f0 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..8816ef6903 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..621726cf03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8590278818..cc433de2b1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3497,6 +3497,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 16144c2b72..7a27b5f680 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..baafbf1b7b 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,23 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -71,7 +66,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -305,3 +300,247 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a dynamic shared memory space. The statistics for contexts that do not fit in
+ * shared memory area are stored as a cumulative total of those contexts,
+ * at the end in the dynamic shared memory.
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory.
+ * Once condition variable comes out of sleep, check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+ int num_retries = 0;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_get_backend_memory_contexts instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Return statistics of top level 1 and 2 contexts, if get_summary is
+ * true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ ConditionVariablePrepareToSleep(&memCtxState[procNumber].memctx_cv);
+ while (1)
+ {
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+#define MEMSTATS_WAIT_TIMEOUT 5000
+#define MAX_RETRIES 20
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again", pid)));
+ if (num_retries > MAX_RETRIES)
+ {
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+ num_retries = num_retries + 1;
+ }
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+ PG_RETURN_NULL();
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..7fc600ff7b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 70d33226cb..863c200433 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -135,6 +141,12 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+typedef enum PrintDetails
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDetails;
/*
* CurrentMemoryContext
@@ -162,10 +174,11 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDetails print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +844,18 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDetails print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +896,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDetails print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+ /* Do not print the statistics */
+
+ /*
+ * print_to_stderr is a no-op if no statistics are going to be printed i.e
+ * print_location == PRINT_STATS_NONE
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +952,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +970,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +985,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1322,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1374,345 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which
+ * can hold statistics of approx 6700 contexts. Remaining
+ * contexts statistics is captured as a cumulative total.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_process_memory_context is view, similar to its local backend
+ * couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ entry->context_id = stats_count;
+
+ stats_count = stats_count + 1;
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ entry->context_id = stats_count;
+ stats_count = stats_count + 1;
+ }
+ contexts = lappend(contexts, c);
+ }
+ /* In summary only the first level contexts are displayed */
+ if (get_summary)
+ break;
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto num_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = stats_count > num_stats ? num_stats : stats_count;
+
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /* Free the memory allocated previously by the same process */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ path = lcons_int(0, path);
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL, &stat, true);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, NULL, stat, 1);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals, PRINT_STATS_NONE, &num_contexts);
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors.
+ */
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (c->ident != NULL)
+ {
+ int idlen = strlen(c->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(c->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, c->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ PublishMemoryContext(meminfo, ctx_id, c, path, (c->ident != NULL ? clipped_ident : NULL), grand_totals, num_contexts);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].in_memory_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 2))
+ {
+ /* Copy statistics to DSM memory */
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL), stat, 1);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[num_stats - 1].totalspace += stat.totalspace;
+ meminfo[num_stats - 1].nblocks += stat.nblocks;
+ meminfo[num_stats - 1].freespace += stat.freespace;
+ meminfo[num_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (num_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (num_stats - 2))
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[num_stats - 1].num_contexts = context_id - memCtxState[idx].in_memory_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts)
+{
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_contexts = num_contexts;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 2dcc2d42da..2f682874d1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8449,6 +8449,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 3f97fcef80..f23750dd6c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 221073def3..8cbf6e201c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index bf93433b78..a58a6b824f 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,50 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_contexts;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..f864c75dbe 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..a56cc44eea 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.34.1
Hi Torikoshia,
Thank you for the review.
=# select path FROM pg_get_process_memory_contexts('20271', false);
path
-------
{0}
{0,1}
{0,2}
..=# select path from pg_backend_memory_contexts;
path
-------
{1}
{1,2}
{1,3}
..asdf asdfWould it be better to begin with 1 to make them consistent?
Makes sense, fixed in the attached patch.
pg_log_backend_memory_contexts() does not allow non-superusers to
execute by default since it can peek at other session information.
pg_get_process_memory_contexts() does not have this restriction, but
wouldn't it be necessary?Yes. I added the restriction to only allow super users and
users with pg_read_all_stats privileges to query the memory context
statistics of another process.
When the target pid is the local backend, the HINT suggests using
pg_get_backend_memory_contexts(), but this function is not described in
the manual.
How about suggesting pg_backend_memory_contexts view instead?=# select pg_get_process_memory_contexts('27041', false);
WARNING: cannot return statistics for local backend
HINT: Use pg_get_backend_memory_contexts insteadThere are no explanations about 'num_agg_contexts', but I thought the
explanation like below would be useful.Ok. I added an explanation of this column in the documentation.
I have added this information as a column named "num_agg_contexts",
which indicates
the number of contexts whose statistics have been aggregated/added for
a particular output.git apply caused some warnings:
Thank you for reporting. They should be gone now.
PFA the patch with above updates.
Thank you,
Rahila Syed
Attachments:
v9-0001-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v9-0001-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From e30e796f08e6fc16ebd5760ce24a4887e249116a Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
doc/src/sgml/func.sgml | 26 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 274 ++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 424 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 51 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
20 files changed, 822 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581a..5d0399508e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28358,6 +28358,32 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes two
+ arguments: PID and a boolean, get_summary. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts column
+ indicates the number of contexts aggregated in the displayed statistics.
+
+ When get_summary is set to true, memory context statistics at levels 1 and 2,
+ are displayed with each level 2 context showing a cumulative total of all
+ its child contexts.
+ When get_summary is set to false, the num_agg_contexts value is 1, indicating
+ that individual statistics are being displayed.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3f826532b8..eb4c98a17b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 9bfd0fd665..ee8360ad6f 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a62..cd1ecb6b93 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 59d213031b..d670954c4e 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ffbf043935..b1a5e86a85 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..4a70eabf7f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625..e425b9eeb0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c01cff9d65..0eae9be122 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3497,6 +3497,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0b53cba807..68a1769967 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..c067cdf870 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -71,7 +68,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -305,3 +302,256 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a dynamic shared memory space. The statistics for contexts that do not fit in
+ * shared memory area are stored as a cumulative total of those contexts,
+ * at the end in the dynamic shared memory.
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory.
+ * Once condition variable comes out of sleep, check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+ int num_retries = 0;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Return statistics of top level 1 and 2 contexts, if get_summary is
+ * true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ ConditionVariablePrepareToSleep(&memCtxState[procNumber].memctx_cv);
+ while (1)
+ {
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+#define MEMSTATS_WAIT_TIMEOUT 5000
+#define MAX_RETRIES 20
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again", pid)));
+ if (num_retries > MAX_RETRIES)
+ {
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+ num_retries = num_retries + 1;
+ }
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+ PG_RETURN_NULL();
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..6bc253da5d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d035..245aba5987 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -135,6 +141,12 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+typedef enum PrintDetails
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDetails;
/*
* CurrentMemoryContext
@@ -162,10 +174,11 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDetails print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +844,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDetails print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +897,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDetails print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+ /* Do not print the statistics */
+
+ /*
+ * print_to_stderr is a no-op if no statistics are going to be printed i.e
+ * print_location == PRINT_STATS_NONE
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +953,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +971,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +986,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1323,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1375,346 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which
+ * can hold statistics of approx 6700 contexts. Remaining
+ * contexts statistics is captured as a cumulative total.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_process_memory_context is view, similar to its local backend
+ * couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ stats_count = stats_count + 1;
+ /* context id starts with 1 */
+ entry->context_id = stats_count;
+
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ stats_count = stats_count + 1;
+ entry->context_id = stats_count;
+ }
+ contexts = lappend(contexts, c);
+ }
+ /* In summary only the first level contexts are displayed */
+ if (get_summary)
+ break;
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto num_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = stats_count > num_stats ? num_stats : stats_count;
+
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /* Free the memory allocated previously by the same process */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ path = lcons_int(1, path);
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL, &stat, true);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, NULL, stat, 1);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals, PRINT_STATS_NONE, &num_contexts);
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors.
+ */
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (c->ident != NULL)
+ {
+ int idlen = strlen(c->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(c->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, c->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ PublishMemoryContext(meminfo, ctx_id, c, path, (c->ident != NULL ? clipped_ident : NULL), grand_totals, num_contexts);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].in_memory_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSM memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL), stat, 1);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[num_stats - 1].totalspace += stat.totalspace;
+ meminfo[num_stats - 1].nblocks += stat.nblocks;
+ meminfo[num_stats - 1].freespace += stat.freespace;
+ meminfo[num_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (num_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (num_stats - 2))
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[num_stats - 1].num_contexts = context_id - memCtxState[idx].in_memory_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts)
+{
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_contexts = num_contexts;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..4d6ae0728a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8449,6 +8449,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index d016a9c924..fc75ea143c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..477ab99338 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..9fac394aad 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,50 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_contexts;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..5e3382132c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..f3127fea40 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.34.1
On 2025/01/06 22:16, Rahila Syed wrote:
PFA the patch with above updates.
Thanks for updating the patch! I like this feature.
I tested this feature and encountered two issues:
Issue 1: Error with pg_get_process_memory_contexts()
When I used pg_get_process_memory_contexts() on the PID of a backend process
that had just caused an error but hadn’t rolled back yet,
the following error occurred:
Session 1 (PID=70011):
=# begin;
=# select 1/0;
ERROR: division by zero
Session 2:
=# select * from pg_get_process_memory_contexts(70011, false);
Session 1 terminated with:
ERROR: ResourceOwnerEnlarge called after release started
FATAL: terminating connection because protocol synchronization was lost
Issue 2: Segmentation Fault
When I ran pg_get_process_memory_contexts() every 0.1 seconds using
\watch command while running "make -j 4 installcheck-world",
I encountered a segmentation fault:
LOG: client backend (PID 97975) was terminated by signal 11: Segmentation fault: 11
DETAIL: Failed process was running: select infinite_recurse();
LOG: terminating any other active server processes
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi Rahila,
Thanks for the updated and rebased patch. I've tried the pgbench test
again, to see if it gets stuck somewhere, and I'm observing this on a
new / idle cluster:
$ pgbench -n -f test.sql -P 1 test -T 60
pgbench (18devel)
progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
...
There's always a period of 10-15 seconds when everything seems to be
working fine, and then a couple seconds when it gets stuck, with the usual
LOG: Wait for 69454 process to publish stats timed out, trying again
The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
of that are processes that should be handling the signal, so how come it
gets stuck every now and then? The system is entirely idle, there's no
contention for the shmem stuff, etc. Could it be forgetting about the
signal in some cases, or something like that?
The test.sql is super simple:
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
ORDER BY random() LIMIT 1)
, false);
Aside from this, I went through the patch to do a regular review, so
here's the main comments in somewhat random order:
1) The SGML docs talk about "contexts at level" but I don't think that's
defined/explained anywhere, there are different ways to assign levels in
a tree-like structure, so it's unclear if levels are assigned from the
top or bottom.
2) volatile sig_atomic_t PublishMemoryContextPending = false;
I'd move this right after LogMemoryContextPending (to match the other
places that add new stuff).
3) typedef enum PrintDetails
I suppose this should have some comments, explaining what the typedef is
for. Also, "details" sounds pretty generic, perhaps "destination" or
maybe "target" would be better?
4) The memcpy here seems unnecessary - the string is going to be static
in the binary, no need to copy it. In which case the whole switch is
going to be the same as in PutMemoryContextsStatsTupleStore, so maybe
move that into a separate function?
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
5) The comment about hash table in ProcessGetMemoryContextInterrupt
seems pretty far from hash_create(), so maybe move it.
6) ProcessGetMemoryContextInterrupt seems pretty long / complex, with
multiple nested loops, it'd be good to split it into smaller parts that
are easier to understand.
7) I'm not sure if/why we need to move MemoryContextId to memutils.h.
8) The new stuff in memutils.h is added to the wrong place, into a
section labeled "Memory-context-type-specific functions" (which it
certainly is not)
9) autovacuum.c adds the ProcessGetMemoryContextInterrupt() call after
ProcessCatchupInterrupt() - that's not wrong, but I'd move it right
after ProcessLogMemoryContextInterrupt(), just like everywhere else.
10) The pg_get_process_memory_contexts comment says:
Signal a backend or an auxiliary process to send its ...
But this is not just about the signal, it also waits for the results and
produces the result set.
11) pg_get_process_memory_contexts - Wouldn't it be better to move the
InitMaterializedSRF() call until after the privilege check, etc.?
12) The pg_get_process_memory_contexts comment should explain why it's
superuser-only function. Presumably it has similar DoS risks as the
other functions, because if not why would we have the restriction?
13) I reworded and expanded the pg_get_process_memory_contexts comment a
bit, and re-wrapped it too. But I think it also needs to explain how it
communicates with the other process (sending signal, sending data
through a DSA, ...). And also how the timeouts work.
14) I'm a bit confused about the DSA allocations (but I also haven't
worked with DSA very much, so maybe it's fine). Presumably the 16MB is
upper limit, we won't use that all the time. We allocate 1MB, but allow
it to grow up to 16MB, correct? 16MB seems like a lot, certainly enough
for this purpose - if it's not, I don't think we can come up with a
better limit.
15) In any case, I don't think the 16 should be hardcoded as a magic
constant in multiple places. That's bound to be error-prone.
16) I've reformatted / reindented / wrapped the code in various places,
to make it easier to read and more consistent with the nearby code. I
also added a bunch of comments explaining what the block of code is
meant to do (I mean, what it aims to do).
16) A comment in pg_get_process_memory_contexts says:
Pin the mapping so that it doesn't throw a warning
That doesn't seem very useful. It's not clear what kind of warning this
hides, but more importantly - we're not doing stuff to hide some sort of
warning, we do it to prevent what the warning is about.
17) pg_get_process_memory_contexts has a bunch of error cases, where we
need to detach the DSA and return NULL. Would be better to do a label
with a goto, I think.
18) I think pg_get_process_memory_contexts will have issues if this
happens in the first loop:
if ((memCtxState[procNumber].proc_id == pid) &&
DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
break;
Because then we end up with memctx_info pointing to garbage after the
loop. I don't know how hard is to hit this, I guess it can happen in
many processes calling pg_get_process_memory_contexts?
19) Minor comment and formatting of MemCtxShmemSize / MemCtxShmemInit.
20) MemoryContextInfo etc. need to be added to typedefs.list, so that
pgindent can do the right thing.
21) I think ProcessGetMemoryContextInterrupt has a bug because it uses
get_summary before reading it from the shmem.
Attached are two patches - 0001 is the original patch, 0002 has most of
my review comments (mentioned above), and a couple additional changes to
comments/formatting, etc. Those are suggestions rather than issues.
regards
--
Tomas Vondra
Attachments:
vtomas-0001-Function-to-report-memory-context-stats-of-an.patchtext/x-patch; charset=UTF-8; name=vtomas-0001-Function-to-report-memory-context-stats-of-an.patchDownload
From be12803e8dbc671595f7945693b3f70abb2b8745 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH vtomas 1/2] Function to report memory context stats of any
backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
doc/src/sgml/func.sgml | 26 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 274 ++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 424 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 51 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
20 files changed, 822 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581ae..5d0399508ea 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28358,6 +28358,32 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes two
+ arguments: PID and a boolean, get_summary. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts column
+ indicates the number of contexts aggregated in the displayed statistics.
+
+ When get_summary is set to true, memory context statistics at levels 1 and 2,
+ are displayed with each level 2 context showing a cumulative total of all
+ its child contexts.
+ When get_summary is set to false, the num_agg_contexts value is 1, indicating
+ that individual statistics are being displayed.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3f826532b88..eb4c98a17bc 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -768,6 +768,10 @@ HandleAutoVacLauncherInterrupts(void)
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 9bfd0fd665c..ee8360ad6fa 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c7136..9481a5cd241 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a626..cd1ecb6b93d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 59d213031b3..d670954c4e9 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ffbf0439358..b1a5e86a85c 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed70367..4a70eabf7f3 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625e..e425b9eeb03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c01cff9d650..0eae9be122b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3497,6 +3497,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0b53cba807d..68a17699675 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..c067cdf8709 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -71,7 +68,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -305,3 +302,256 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context statistics
+ * in a dynamic shared memory space. The statistics for contexts that do not fit in
+ * shared memory area are stored as a cumulative total of those contexts,
+ * at the end in the dynamic shared memory.
+ * Wait for the backend to send signal on the condition variable after
+ * writing statistics to a shared memory.
+ * Once condition variable comes out of sleep, check if the required
+ * backends statistics are available to read and display.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextInfo *memctx_info;
+ MemoryContext oldContext;
+ int num_retries = 0;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Return statistics of top level 1 and 2 contexts, if get_summary is
+ * true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ 16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ MemoryContextSwitchTo(oldContext);
+ handle = dsa_get_handle(area);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ /* Pin the mapping so that it doesn't throw a warning */
+ dsa_pin(area);
+ dsa_pin_mapping(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ ConditionVariablePrepareToSleep(&memCtxState[procNumber].memctx_cv);
+ while (1)
+ {
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+#define MEMSTATS_WAIT_TIMEOUT 5000
+#define MAX_RETRIES 20
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again", pid)));
+ if (num_retries > MAX_RETRIES)
+ {
+ dsa_detach(area);
+ PG_RETURN_NULL();
+ }
+ num_retries = num_retries + 1;
+ }
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the a valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ }
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+ /* Backend has finished publishing the stats, read them */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+ PG_RETURN_NULL();
+}
+
+static Size
+MemCtxShmemSize(void)
+{
+ Size size;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ size = TotalProcs * sizeof(MemoryContextState);
+ return size;
+}
+
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdaef..6bc253da5da 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -42,6 +42,7 @@ volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
+volatile sig_atomic_t PublishMemoryContextPending = false;
int MyProcPid;
pg_time_t MyStartTime;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d0352..245aba5987c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -135,6 +141,12 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+typedef enum PrintDetails
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDetails;
/*
* CurrentMemoryContext
@@ -162,10 +174,11 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDetails print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +844,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDetails print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +897,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDetails print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+ /* Do not print the statistics */
+
+ /*
+ * print_to_stderr is a no-op if no statistics are going to be printed i.e
+ * print_location == PRINT_STATS_NONE
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +953,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +971,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +986,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1323,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1375,346 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * Run by each backend to publish their memory context
+ * statistics. It performs a breadth first search
+ * on the memory context tree, so that the parents
+ * get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which
+ * can hold statistics of approx 6700 contexts. Remaining
+ * contexts statistics is captured as a cumulative total.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ /* Store the memory context details in shared memory */
+
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ bool found;
+ MemoryContext stat_cxt;
+ MemoryContextInfo *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int num_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of
+ * pg_get_process_memory_context is view, similar to its local backend
+ * couterpart.
+ */
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup
+ */
+
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested find the total number of contexts at level 1 and
+ * 2.
+ */
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ stats_count = stats_count + 1;
+ /* context id starts with 1 */
+ entry->context_id = stats_count;
+
+ /* Append the children of the current context to the main list */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ stats_count = stats_count + 1;
+ entry->context_id = stats_count;
+ }
+ contexts = lappend(contexts, c);
+ }
+ /* In summary only the first level contexts are displayed */
+ if (get_summary)
+ break;
+ }
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto num_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = stats_count > num_stats ? num_stats : stats_count;
+
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ memCtxState[idx].proc_id = MyProcPid;
+ get_summary = memCtxState[idx].get_summary;
+
+ /* Free the memory allocated previously by the same process */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
+ meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ path = lcons_int(1, path);
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL, &stat, true);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, NULL, stat, 1);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals, PRINT_STATS_NONE, &num_contexts);
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors.
+ */
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (c->ident != NULL)
+ {
+ int idlen = strlen(c->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(c->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, c->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ PublishMemoryContext(meminfo, ctx_id, c, path, (c->ident != NULL ? clipped_ident : NULL), grand_totals, num_contexts);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].in_memory_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ for (MemoryContext cur_context = cur; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (cur->ident != NULL)
+ {
+ int idlen = strlen(cur->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(cur->ident, idlen, MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, cur->ident, idlen);
+ clipped_ident[idlen] = '\0';
+ }
+ if (context_id <= (num_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSM memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, (cur->ident != NULL ? clipped_ident : NULL), stat, 1);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[num_stats - 1].totalspace += stat.totalspace;
+ meminfo[num_stats - 1].nblocks += stat.nblocks;
+ meminfo[num_stats - 1].freespace += stat.freespace;
+ meminfo[num_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (num_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[num_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (num_stats - 2))
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[num_stats - 1].num_contexts = context_id - memCtxState[idx].in_memory_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+static void
+PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts)
+{
+ char *type;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ if (clipped_ident != NULL)
+ {
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name, clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident, clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_contexts = num_contexts;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f882..4d6ae0728ac 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8449,6 +8449,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index d016a9c9248..fc75ea143c3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed933..477ab993386 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..9fac394aad3 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,11 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
/*
* Standard top-level memory contexts.
*
@@ -115,6 +122,50 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size initBlockSize,
Size maxBlockSize);
+/* Dynamic shared memory state for Memory Context Statistics reporting */
+typedef struct MemoryContextInfo
+{
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ char type[MAX_TYPE_STRING_LENGTH];
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_contexts;
+} MemoryContextInfo;
+
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState * memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
/*
* This wrapper macro exists to check for non-constant strings used as context
* names; that's no longer supported. (Use MemoryContextSetIdentifier if you
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215f..5e3382132c3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a79237543..f3127fea400 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
--
2.47.1
vtomas-0002-review.patchtext/x-patch; charset=UTF-8; name=vtomas-0002-review.patchDownload
From 0ef536a26403973b081fd14de79e047cd0cd1a13 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 6 Jan 2025 19:22:14 +0100
Subject: [PATCH vtomas 2/2] review
---
src/backend/utils/adt/mcxtfuncs.c | 109 +++++++++++++++++++++++++-----
src/backend/utils/mmgr/mcxt.c | 93 +++++++++++++++++--------
src/include/utils/memutils.h | 5 +-
3 files changed, 159 insertions(+), 48 deletions(-)
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index c067cdf8709..f3ffd6937a5 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -307,16 +307,26 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* pg_get_process_memory_contexts
* Signal a backend or an auxiliary process to send its memory contexts.
*
+ * By default, only superusers are allowed to signal to return the memory
+ * contexts because allowing any users to issue this request at an unbounded
+ * rate would cause lots of log messages and which can lead to denial of
+ * service. Additional roles can be permitted with GRANT.
+ *
* On receipt of this signal, a backend or an auxiliary process sets the flag
* in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
- * or process-specific interrupt handler to copy the memory context statistics
- * in a dynamic shared memory space. The statistics for contexts that do not fit in
- * shared memory area are stored as a cumulative total of those contexts,
- * at the end in the dynamic shared memory.
- * Wait for the backend to send signal on the condition variable after
- * writing statistics to a shared memory.
- * Once condition variable comes out of sleep, check if the required
- * backends statistics are available to read and display.
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - it the process has too many
+ * memory contexts, the memory contexts into that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * Once condition variable comes out of sleep, check if the memory context
+ * information is available for read and display.
+ *
+ * XXX Explain how the backends communicate through condition variables.
+ *
+ * XXX Explain what happens with timeouts, etc.
*/
Datum
pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
@@ -333,6 +343,7 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
MemoryContext oldContext;
int num_retries = 0;
+ /* XXX Shouldn't this be after the privilege check etc.? */
InitMaterializedSRF(fcinfo, 0);
/*
@@ -383,24 +394,50 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
/*
* Return statistics of top level 1 and 2 contexts, if get_summary is
* true.
+ *
+ * XXX Is this comment still accurate? Or are we returning information
+ * about more contexts? Or more precisely, isn't that irrelevant here?
+ * We just process whatever the process puts into the DSA, right?
+ *
+ * XXX I'd move this comment until after we wake up and are ready to
+ * process the information, close to the comment:
+ *
+ * Backend has finished publishing the stats, read them
*/
- LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
- memCtxState[procNumber].get_summary = get_summary;
/*
* Create a DSA segment with maximum size of 16MB, send handle to the
* publishing process for storing the stats. If number of contexts exceed
* 16MB, a cumulative total is stored for such contexts.
+ *
+ * XXX 16MB seems like an awfully large amount of memory, particularly
+ * for small machines. Maybe it should be configurable as a parameter
+ * of the SQL function? In any case, it should not be hardcoded as a
+ * magic constant. Maybe add a #define constant?
*/
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ memCtxState[procNumber].get_summary = get_summary;
+
if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
{
oldContext = MemoryContextSwitchTo(TopMemoryContext);
- area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche, DSA_DEFAULT_INIT_SEGMENT_SIZE,
+
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche,
+ DSA_DEFAULT_INIT_SEGMENT_SIZE,
16 * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+
MemoryContextSwitchTo(oldContext);
+
handle = dsa_get_handle(area);
+
memCtxState[procNumber].memstats_dsa_handle = handle;
- /* Pin the mapping so that it doesn't throw a warning */
+
+ /* Pin the mapping so that it doesn't throw a warning
+ *
+ * XXX We don't pin stuff "so that it doesn't throw a warning". Surely
+ * the warning has a reason, so maybe mention that?
+ */
dsa_pin(area);
dsa_pin_mapping(area);
}
@@ -409,11 +446,20 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
dsa_pin_mapping(area);
}
+
LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Send a signal to the auxiliary process, informing it we want it to
+ * produce information about memory contexts.
+ */
if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
{
ereport(WARNING,
(errmsg("could not send signal to process %d: %m", pid)));
+
+ /* XXX We do exactly this in a number of places. Maybe it'd be better
+ * to define an "error" label at the end, and goto to it? */
dsa_detach(area);
PG_RETURN_NULL();
}
@@ -461,13 +507,22 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
*/
LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
- if (memCtxState[procNumber].proc_id == pid && DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ /*
+ * XXX Explain how could it happen that the PID does not match.
+ */
+ if ((memCtxState[procNumber].proc_id == pid) &&
+ DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
break;
else
LWLockRelease(&memCtxState[procNumber].lw_lock);
}
+
if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
memctx_info = (MemoryContextInfo *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+
+ /* XXX What if the memstats_dsa_pointer is not valid? Is it even possible?
+ * If it is, we have garbage in memctx_info. Maybe should be an Assert()? */
+
/* Backend has finished publishing the stats, read them */
for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
{
@@ -489,17 +544,21 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
nulls[1] = true;
values[2] = CStringGetTextDatum(memctx_info[i].type);
+
path_length = memctx_info[i].path_length;
path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
values[3] = PointerGetDatum(path_array);
+
values[4] = Int64GetDatum(memctx_info[i].totalspace);
values[5] = Int64GetDatum(memctx_info[i].nblocks);
values[6] = Int64GetDatum(memctx_info[i].freespace);
values[7] = Int64GetDatum(memctx_info[i].freechunks);
values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
}
+
/* If there are more contexts, display a cumulative total of those */
if (memCtxState[procNumber].total_stats > i)
{
@@ -516,29 +575,41 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
values[7] = Int64GetDatum(memctx_info[i].freechunks);
values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
}
+
LWLockRelease(&memCtxState[procNumber].lw_lock);
+
ConditionVariableCancelSleep();
+
dsa_detach(area);
PG_RETURN_NULL();
}
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
static Size
MemCtxShmemSize(void)
{
- Size size;
- Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
- size = TotalProcs * sizeof(MemoryContextState);
- return size;
+ return mul_size(TotalProcs, sizeof(MemoryContextState));
}
+/*
+ * Init shared memory for reporting memory context information.
+ *
+ * XXX Should this check IsUnderPostmaster, similarly to e.g. CommitTsShmemInit?
+ */
void
MemCtxShmemInit(void)
{
bool found;
- Size TotalProcs = add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
MemCtxShmemSize(),
@@ -548,8 +619,10 @@ MemCtxShmemInit(void)
for (int i = 0; i < TotalProcs; i++)
{
ConditionVariableInit(&memCtxState[i].memctx_cv);
+
LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+
memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 245aba5987c..8992a01ee32 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -178,7 +178,11 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
-static void PublishMemoryContext(MemoryContextInfo * memctx_infos, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts);
+static void PublishMemoryContext(MemoryContextInfo * memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path, char *clipped_ident,
+ MemoryContextCounters stat,
+ int num_contexts);
/*
* You should not do memory allocations within a critical section, because
@@ -1376,20 +1380,22 @@ ProcessLogMemoryContextInterrupt(void)
}
/*
- * Run by each backend to publish their memory context
- * statistics. It performs a breadth first search
- * on the memory context tree, so that the parents
- * get a chance to report stats before their children.
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
*
- * Statistics are shared via dynamic shared memory which
- * can hold statistics of approx 6700 contexts. Remaining
- * contexts statistics is captured as a cumulative total.
+ * Performs a breadth first search on the memory context tree, so that the
+ * parents get a chance to report stats before their children.
+ *
+ * Statistics are shared via dynamic shared memory which can hold statistics
+ * of approx 6700 contexts. Remaining contexts statistics is captured as a
+ * cumulative total.
+ *
+ * XXX Seems a bit fragile to mention the number of contexts here - if the
+ * DSA size changes (in mcxtfuncs.c), this will get stale.
*/
void
ProcessGetMemoryContextInterrupt(void)
{
- /* Store the memory context details in shared memory */
-
List *contexts;
HASHCTL ctl;
@@ -1407,21 +1413,18 @@ ProcessGetMemoryContextInterrupt(void)
PublishMemoryContextPending = false;
- /*
- * The hash table is used for constructing "path" column of
- * pg_get_process_memory_context is view, similar to its local backend
- * couterpart.
- */
-
/*
* Make a new context that will contain the hash table, to ease the
- * cleanup
+ * cleanup.
*/
-
stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
"Memory context statistics",
ALLOCSET_DEFAULT_SIZES);
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
ctl.keysize = sizeof(MemoryContext);
ctl.entrysize = sizeof(MemoryContextId);
ctl.hcxt = stat_cxt;
@@ -1431,39 +1434,65 @@ ProcessGetMemoryContextInterrupt(void)
&ctl,
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ /* List of contexts to process in the next round - start at the top. */
contexts = list_make1(TopMemoryContext);
/* Compute the number of stats that can fit in the DSM seg */
+ /*
+ * XXX Don't hardcode the size in two places. Define a constant or
+ * something like that, so that we can change one place and it stays
+ * in sync. Or even better, define it just in mcxtfuncs.c, and store
+ * the size in the shmem.
+ *
+ * XXX The name is misleading - this is not the number of stats we're
+ * about to produce, it's the maximum number of entries we can fit into
+ * the shmem. I'd name this max_stats.
+ *
+ * XXX Also, what if we fill exactly this number of contexts? Won't we
+ * lose the last entry because it will be overwitten by the summary?
+ */
num_stats = floor(16 * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextInfo));
/*
* Traverse the memory context tree to find total number of contexts. If
* summary is requested find the total number of contexts at level 1 and
* 2.
+ *
+ * XXX I'm confused about how this interacts with the get_summary flag.
+ * In fact, this always uses get_summary=false, because we only read the
+ * flag from shmem later. Seems like a bug.
*/
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryContextId *entry;
entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
HASH_ENTER, &found);
- stats_count = stats_count + 1;
+ Assert(!found);
+
/* context id starts with 1 */
- entry->context_id = stats_count;
+ entry->context_id = (++stats_count);
- /* Append the children of the current context to the main list */
+ /* Append the children of the current context to the main list. */
for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
{
+ /* XXX I don't understand why we need to check get_summary here? */
if (get_summary)
{
entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
HASH_ENTER, &found);
- stats_count = stats_count + 1;
- entry->context_id = stats_count;
+ Assert(!found);
+
+ entry->context_id = (++stats_count);
}
+
contexts = lappend(contexts, c);
}
- /* In summary only the first level contexts are displayed */
+
+ /* In summary only the first level contexts are displayed
+ *
+ * XXX Probably should say "first two levels"?
+ */
if (get_summary)
break;
}
@@ -1474,23 +1503,30 @@ ProcessGetMemoryContextInterrupt(void)
* segment, a cumulative total is written as the last record in the DSA
* segment.
*/
- stats_count = stats_count > num_stats ? num_stats : stats_count;
+ stats_count = (stats_count > num_stats) ? num_stats : stats_count;
/* Attach to DSA segment */
LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+
memCtxState[idx].proc_id = MyProcPid;
+
+ /* XXX this is where we get the get_summary flag */
get_summary = memCtxState[idx].get_summary;
- /* Free the memory allocated previously by the same process */
+ /* Free the memory allocated previously by the same process. */
if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
{
dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
}
+
memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextInfo));
meminfo = (MemoryContextInfo *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+ /* XXX I find this really hard to understand, with the nested loops etc.
+ * I suggest breaking this up into smaller functions, and calling them
+ * (easier to understand) than huge lump of code. */
if (get_summary)
{
int ctx_id = 0;
@@ -1506,7 +1542,7 @@ ProcessGetMemoryContextInterrupt(void)
/*
* Copy statistics for each of TopMemoryContexts children(XXX. Make it
* capped at 100). This includes statistics of all of their children
- * upto level 100
+ * upto level 100.
*/
for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild)
{
@@ -1651,6 +1687,7 @@ cleanup:
dsa_detach(area);
}
+/* XXX this really needs some better formatting and comments */
static void
PublishMemoryContext(MemoryContextInfo * memctx_info, int curr_id, MemoryContext context, List *path, char *clipped_ident, MemoryContextCounters stat, int num_contexts)
{
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 9fac394aad3..49c7bf5c376 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -123,8 +123,9 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
Size maxBlockSize);
/* Dynamic shared memory state for Memory Context Statistics reporting */
-typedef struct MemoryContextInfo
+typedef struct MemoryContextInfo /* XXX I'd name this MemoryContextEntry */
{
+ /* XXX isn't 2 x 1kB for every context a bit too much? Maybe better to make it variable-length? */
char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
Datum path[MEM_CONTEXT_MAX_LEVEL];
@@ -135,7 +136,7 @@ typedef struct MemoryContextInfo
int64 freespace;
int64 freechunks;
int num_contexts;
-} MemoryContextInfo;
+} MemoryContextInfo; /* XXX needs to be added to typedefs, so that pgindent works */
typedef struct MemoryContextState
{
--
2.47.1
Hi Fujii-san,
Thank you for testing the feature.
Issue 1: Error with pg_get_process_memory_contexts()
When I used pg_get_process_memory_contexts() on the PID of a backend
process
that had just caused an error but hadn’t rolled back yet,
the following error occurred:Session 1 (PID=70011):
=# begin;
=# select 1/0;
ERROR: division by zeroSession 2:
=# select * from pg_get_process_memory_contexts(70011, false);Session 1 terminated with:
ERROR: ResourceOwnerEnlarge called after release started
FATAL: terminating connection because protocol synchronization was lostIn this scenario, a DSM segment descriptor is created and associated with
the
CurrentResourceOwner, which is set to the aborting transaction's resource
owner.
This occurs when ProcessGetMemoryContextInterrupts is called by the backend
while a transaction is still open and about to be rolled back.
I believe this issue needs to be addressed in the DSA and DSM code by
adding
a check to ensure that the CurrentResourceOwner is not about to be released
before
creating a DSM under the CurrentResourceOwner.
The attached fix resolves this issue. However, for a more comprehensive
solution,
I believe the same change should be extended to other parts of the DSA and
DSM
code where CurrentResourceOwner is referenced.
Issue 2: Segmentation Fault
When I ran pg_get_process_memory_contexts() every 0.1 seconds using
\watch command while running "make -j 4 installcheck-world",
I encountered a segmentation fault:LOG: client backend (PID 97975) was terminated by signal 11:
Segmentation fault: 11
DETAIL: Failed process was running: select infinite_recurse();
LOG: terminating any other active server processesI have not been able to reproduce this issue. Could you please clarify
which process you ran
pg_get_process_memory_context() on, with the interval of 0.1? Was it a
backend process
created by make installcheck-world, or some other process?
Thank you,
Rahila Syed
Attachments:
fix_for_resource_owner_error.patchapplication/octet-stream; name=fix_for_resource_owner_error.patchDownload
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index f92a52a00e..24f20be3bd 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -1202,7 +1202,7 @@ dsm_create_descriptor(void)
{
dsm_segment *seg;
- if (CurrentResourceOwner)
+ if (CurrentResourceOwner && !IsResourceOwnerReleasing(CurrentResourceOwner))
ResourceOwnerEnlarge(CurrentResourceOwner);
seg = MemoryContextAlloc(TopMemoryContext, sizeof(dsm_segment));
@@ -1213,9 +1213,11 @@ dsm_create_descriptor(void)
seg->impl_private = NULL;
seg->mapped_address = NULL;
seg->mapped_size = 0;
-
- seg->resowner = CurrentResourceOwner;
- if (CurrentResourceOwner)
+ if (CurrentResourceOwner && IsResourceOwnerReleasing(CurrentResourceOwner))
+ seg->resowner = NULL;
+ else
+ seg->resowner = CurrentResourceOwner;
+ if (CurrentResourceOwner && !IsResourceOwnerReleasing(CurrentResourceOwner))
ResourceOwnerRememberDSM(CurrentResourceOwner, seg);
slist_init(&seg->on_detach);
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index ac5ca4a765..134fbee59b 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -428,6 +428,11 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name)
return owner;
}
+bool
+IsResourceOwnerReleasing(ResourceOwner owner)
+{
+ return(owner->releasing);
+}
/*
* Make sure there is room for at least one more resource in an array.
*
diff --git a/src/include/utils/resowner.h b/src/include/utils/resowner.h
index e8d452ca7e..69efaca46d 100644
--- a/src/include/utils/resowner.h
+++ b/src/include/utils/resowner.h
@@ -145,6 +145,7 @@ extern ResourceOwner ResourceOwnerGetParent(ResourceOwner owner);
extern void ResourceOwnerNewParent(ResourceOwner owner,
ResourceOwner newparent);
+extern bool IsResourceOwnerReleasing(ResourceOwner owner);
extern void ResourceOwnerEnlarge(ResourceOwner owner);
extern void ResourceOwnerRemember(ResourceOwner owner, Datum value, const ResourceOwnerDesc *kind);
extern void ResourceOwnerForget(ResourceOwner owner, Datum value, const ResourceOwnerDesc *kind);
On 2025/01/08 21:03, Rahila Syed wrote:
I have not been able to reproduce this issue. Could you please clarify which process you ran
|pg_get_process_memory_context()| on, with the interval of 0.1?
I used the following query for testing:
=# SELECT count(*) FROM pg_stat_activity, pg_get_process_memory_contexts(pid, false) WHERE pid <> pg_backend_pid();
=# \watch 0.1
Was it a backend process
created by |make installcheck-world|, or some other process?
Yes, the target backends were from make installcheck-world.
No other workloads were running.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi Tomas,
Thank you for the review.
On Tue, Jan 7, 2025 at 2:32 AM Tomas Vondra <tomas@vondra.me> wrote:
Hi Rahila,
Thanks for the updated and rebased patch. I've tried the pgbench test
again, to see if it gets stuck somewhere, and I'm observing this on a
new / idle cluster:$ pgbench -n -f test.sql -P 1 test -T 60
pgbench (18devel)
progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
...There's always a period of 10-15 seconds when everything seems to be
working fine, and then a couple seconds when it gets stuck, with the usualLOG: Wait for 69454 process to publish stats timed out, trying again
The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
of that are processes that should be handling the signal, so how come it
gets stuck every now and then? The system is entirely idle, there's no
contention for the shmem stuff, etc. Could it be forgetting about the
signal in some cases, or something like that?I am not sure as of now, I will debug further. Meanwhile, I have addressed
the
review comments. Please find the details and an updated patch below.
1) The SGML docs talk about "contexts at level" but I don't think that's
defined/explained anywhere, there are different ways to assign levels in
a tree-like structure, so it's unclear if levels are assigned from the
top or bottom.
Fixed.
2) volatile sig_atomic_t PublishMemoryContextPending = false;
I'd move this right after LogMemoryContextPending (to match the other
places that add new stuff).
Done.
3) typedef enum PrintDetails
I suppose this should have some comments, explaining what the typedef is
for. Also, "details" sounds pretty generic, perhaps "destination" or
maybe "target" would be better?I added the comments above the typedef and changed the name to
PrintDestination.
4) The memcpy here seems unnecessary - the string is going to be static
in the binary, no need to copy it. In which case the whole switch is
going to be the same as in PutMemoryContextsStatsTupleStore, so maybe
move that into a separate function?+ switch (context->type) + { + case T_AllocSetContext: + type = "AllocSet"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_GenerationContext: + type = "Generation"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_SlabContext: + type = "Slab"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + case T_BumpContext: + type = "Bump"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + default: + type = "???"; + strncpy(memctx_info[curr_id].type, type, strlen(type)); + break; + }
Got rid of the copy and moved the switch to a separate function.
5) The comment about hash table in ProcessGetMemoryContextInterrupt
seems pretty far from hash_create(), so maybe move it.Was fixed in your suggestions patch.
6) ProcessGetMemoryContextInterrupt seems pretty long / complex, with
multiple nested loops, it'd be good to split it into smaller parts that
are easier to understand.Done the refactoring to move certain parts into separate functions.
7) I'm not sure if/why we need to move MemoryContextId to memutils.h.
This is because I am referencing it from both mcxt.c and mcxtfuns.c. I can
consider moving some of the code out of mcxt.c and consolidating
everything related to this patch in mcxtfuncs.c if mcxt.c is intended to
contain only the core memory context logic.
8) The new stuff in memutils.h is added to the wrong place, into a
section labeled "Memory-context-type-specific functions" (which it
certainly is not)Fixed.
9) autovacuum.c adds the ProcessGetMemoryContextInterrupt() call after
ProcessCatchupInterrupt() - that's not wrong, but I'd move it right
after ProcessLogMemoryContextInterrupt(), just like everywhere else.Fixed too.
10) The pg_get_process_memory_contexts comment says:
Signal a backend or an auxiliary process to send its ...
But this is not just about the signal, it also waits for the results and
produces the result set.
Makes sense, edited accordingly.
11) pg_get_process_memory_contexts - Wouldn't it be better to move the
InitMaterializedSRF() call until after the privilege check, etc.?
I have moved it after the super user check but kept it before some other
checks that lead to WARNING, after looking at how other functions have done
it.
12) The pg_get_process_memory_contexts comment should explain why it's
superuser-only function. Presumably it has similar DoS risks as the
other functions, because if not why would we have the restriction?Edited accordingly.
13) I reworded and expanded the pg_get_process_memory_contexts comment a
bit, and re-wrapped it too. But I think it also needs to explain how it
communicates with the other process (sending signal, sending data
through a DSA, ...). And also how the timeouts work.Thank you for improving the comments. Added remaining changes as requested.
14) I'm a bit confused about the DSA allocations (but I also haven't
worked with DSA very much, so maybe it's fine). Presumably the 16MB is
upper limit, we won't use that all the time. We allocate 1MB, but allow
it to grow up to 16MB, correct?
Yes.
16MB seems like a lot, certainly enough
for this purpose - if it's not, I don't think we can come up with a
better limit.I can try reducing it to 8MB, although it's expected to be only allocated
when needed.
15) In any case, I don't think the 16 should be hardcoded as a magic
constant in multiple places. That's bound to be error-prone.Done.
16) I've reformatted / reindented / wrapped the code in various places,
to make it easier to read and more consistent with the nearby code. I
also added a bunch of comments explaining what the block of code is
meant to do (I mean, what it aims to do).Thank you
16) A comment in pg_get_process_memory_contexts says:
Pin the mapping so that it doesn't throw a warning
That doesn't seem very useful. It's not clear what kind of warning this
hides, but more importantly - we're not doing stuff to hide some sort of
warning, we do it to prevent what the warning is about.Makes sense, fixed.
17) pg_get_process_memory_contexts has a bunch of error cases, where we
need to detach the DSA and return NULL. Would be better to do a label
with a goto, I think.
Done.
18) I think pg_get_process_memory_contexts will have issues if this
happens in the first loop:if ((memCtxState[procNumber].proc_id == pid) &&
DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
break;Because then we end up with memctx_info pointing to garbage after the
loop. I don't know how hard is to hit this, I guess it can happen in
many processes calling pg_get_process_memory_contexts?
I think this is not possible since if the breaking condition is met, it
means
memstats_dsa_pointer is valid and memctx_info which resides
at mestats_dsa_pointer will contain valid data. Am I missing something?
Regarding the proc_id == pid check, I have added a comment in the code as
requested.
19) Minor comment and formatting of MemCtxShmemSize / MemCtxShmemInit.
Ok.
20) MemoryContextInfo etc. need to be added to typedefs.list, so that
pgindent can do the right thing.
Done.
21) I think ProcessGetMemoryContextInterrupt has a bug because it uses
get_summary before reading it from the shmem.
Fixed. It was not showing up in tests as the result of the bug was some
extra memory allocation
in dsa and some extra computation to populate all the paths in hash table
inspite of
get_summary being true.
Attached are two patches - 0001 is the original patch, 0002 has most of
my review comments (mentioned above), and a couple additional changes to
comments/formatting, etc. Those are suggestions rather than issues.Thank you, applied the 0002 patch and made the changes mentioned in XXX.
Answering some of your questions in the 0002 patch below:
Q. * XXX Also, what if we fill exactly this number of contexts? Won't we
* lose the last entry because it will be overwitten by the summary?
A. We are filling 0 to max_stats - 2 slots by memory context in the loop
foreach_ptr(MemoryContextData, cur, contexts) in
ProcessGetMemoryContextInterrupt.
max_stats - 1 is reserved for the summary statistics.
Q. /* XXX I don't understand why we need to check get_summary here? */
A. get_summary check is there to ensure that the context_id is inserted in
the
hash_table if get_summary is true. If get_summary is true, the loop will
break after the first iteration
and the entire main list of contexts won't be traversed and hence
context_ids won't be inserted.
Hence it is handled separately inside a check for get_summary.
Q. /* XXX What if the memstats_dsa_pointer is not valid? Is it even
possible?
* If it is, we have garbage in memctx_info. Maybe it should be an
Assert()? */
A . Agreed. Changed it to an assert.
Q. /*
* XXX isn't 2 x 1kB for every context a bit too much? Maybe better
to
* make it variable-length?
*/
A. I don't know how to do this for a variable in shared memory, won't that
mean
allocating from the heap and thus the pointer would become invalid in
another
process?
Thank you,
Rahila Syed
Attachments:
v10-0001-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v10-0001-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From 61cb31213b4e9e4acdad13d8466c05e8ea767b1b Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
doc/src/sgml/func.sgml | 29 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 374 ++++++++++++++--
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 421 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 57 +++
src/test/regress/expected/sysviews.out | 12 +
src/test/regress/sql/sysviews.sql | 12 +
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 912 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581a..f9954c6d41 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28358,6 +28358,35 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes two
+ arguments: PID and a boolean, get_summary. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts column
+ indicates the number of contexts aggregated in the displayed statistics.
+
+ When get_summary is set to true, statistics for memory contexts at levels
+ 1 and 2 are displayed, with level 1 representing the root node
+ (i.e., TopMemoryContext). Each level 2 context's statistics represent an
+ aggregate of all its child contexts' statistics, with num_agg_contexts
+ indicating the number of these aggregated child contexts.
+
+ When get_summary is set to false, the num_agg_contexts value is 1,
+ indicating that individual statistics are being displayed.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3f826532b8..4a0e319baa 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -766,6 +766,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 9bfd0fd665..ee8360ad6f 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a62..cd1ecb6b93 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 59d213031b..d670954c4e 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ffbf043935..b1a5e86a85 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..4a70eabf7f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625..e425b9eeb0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c01cff9d65..0eae9be122 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3497,6 +3497,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0b53cba807..68a1769967 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..971de2ca3f 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -71,7 +68,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -143,24 +140,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = AssignContextType(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +155,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+AssignContextType(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +311,311 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers are allowed to signal to return the memory
+ * contexts because allowing any users to issue this request at an unbounded
+ * rate would cause lots of requests to be sent and which can lead to denial of
+ * service. Additional roles can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - it the process has too many
+ * memory contexts, the memory contexts into that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing backend,
+ * after copying the data to shared memory, sends signal on that condition variable.
+ * Once condition variable comes out of sleep, check if the memory context
+ * information is available for read and display.
+ *
+ * If the publishing backend does not respond before the condition variable times
+ * out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for MAX_RETRIES number of
+ * times before giving up and returning without statistics.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextEntry *memctx_info;
+ MemoryContext oldContext;
+ int num_retries = 0;
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche,
+ DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE);
+
+ MemoryContextSwitchTo(oldContext);
+
+ handle = dsa_get_handle(area);
+
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+
+ /*
+ * Pin the dsa area even if the creating backend exits, this is to
+ * make sure the area remains attachable even if current client exits
+ */
+ dsa_pin(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ /* dsa_pin_mapping(area); */
+ }
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Send a signal to the auxiliary process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+
+ goto end;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend.
+ */
+ ConditionVariablePrepareToSleep(&memCtxState[procNumber].memctx_cv);
+ while (1)
+ {
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ goto end;
+ }
+#define MEMSTATS_WAIT_TIMEOUT 5000
+#define MAX_RETRIES 20
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again", pid)));
+ if (num_retries > MAX_RETRIES)
+ {
+ goto end;
+ }
+ num_retries = num_retries + 1;
+ }
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if ((memCtxState[procNumber].proc_id == pid) &&
+ DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ }
+
+ /* We should land here only with a valid memstats_dsa_pointer */
+ Assert(DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer));
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+
+end:
+ dsa_detach(area);
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ *
+ * XXX Should this check IsUnderPostmaster, similarly to e.g. CommitTsShmemInit?
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche, "mem_context_stats_reporting");
+
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d035..ba3641d017 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -135,6 +141,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,10 +179,20 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static void compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +858,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +911,42 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +966,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +999,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1336,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1388,330 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, so that the
+ * parents get a chance to report stats before their children.
+ *
+ * Statistics for individual contexts are shared via dynamic shared memory.
+ * The statistics for contexts that do not fit in the allocated size of the DSA,
+ * are captured as a cumulative total.
+ *
+ * If get_summary is true, we traversse the memory context tree recursively to
+ * cover all the children of a parent context to be able to display a cumulative
+ * total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ max_stats = floor(MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE / sizeof(MemoryContextEntry));
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top.
+ */
+ compute_num_of_contexts(contexts, context_id_lookup, &stats_count, get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+
+ memCtxState[idx].proc_id = MyProcPid;
+
+ /* Free the memory allocated previously by the same process. */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextEntry));
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL, &stat, true);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat, 1);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals, PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path, grand_totals, num_contexts);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].in_memory_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[max_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_contexts = context_id - memCtxState[idx].in_memory_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+/*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup, int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa buffer */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name,
+ clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident,
+ clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ memctx_info[curr_id].type = AssignContextType(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_contexts = num_contexts;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..4d6ae0728a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8449,6 +8449,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index d016a9c924..fc75ea143c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..477ab99338 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..4fa46e7225 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,53 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for memory context statistics reporting */
+typedef struct MemoryContextEntry
+{
+ /*
+ * XXX isn't 2 x 1kB for every context a bit too much? Maybe better to
+ * make it variable-length?
+ */
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_contexts;
+} MemoryContextEntry;
+
+/* Shared memory state for memory context statistics reporting */
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState *memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *AssignContextType(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..5e3382132c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,15 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
t
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..f3127fea40 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,15 @@ set timezone_abbreviations = 'Australia';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
set timezone_abbreviations = 'India';
select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer' INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false) where path = '{0}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e1c4f913f8..e170c5b351 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1628,8 +1628,10 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
Hi Tomas,
I've tried the pgbench test
again, to see if it gets stuck somewhere, and I'm observing this on a
new / idle cluster:$ pgbench -n -f test.sql -P 1 test -T 60
pgbench (18devel)
progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
...There's always a period of 10-15 seconds when everything seems to be
working fine, and then a couple seconds when it gets stuck, with the usualLOG: Wait for 69454 process to publish stats timed out, trying again
The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
of that are processes that should be handling the signal, so how come it
gets stuck every now and then? The system is entirely idle, there's no
contention for the shmem stuff, etc. Could it be forgetting about the
signal in some cases, or something like that?Yes, This occurs when, due to concurrent signals received by a backend,
both signals are processed together, and stats are published only once.
Once the stats are read by the first client that gains access, they are
erased,
causing the second client to wait until timeout.
If we make clients wait for the latest stats, timeouts may occur during
concurrent
operations. To avoid such timeouts, we can retain the previously published
memory
statistics for every backend and avoid waiting for the latest statistics
when the
previous statistics are newer than STALE_STATS_LIMIT. This limit can be
determined
based on the server load and how fast the memory statistics requests are
being
handled by the server.
For example, on a server running make -j 4 installcheck-world while
concurrently
probing client backends for memory statistics using pgbench, accepting
statistics
that were approximately 1 second old helped eliminate timeouts. Conversely,
on an
idle system, waiting for new statistics when the previous ones were older
than 0.1
seconds was sufficient to avoid any timeouts caused by concurrent requests.
PFA an updated and rebased patch that includes the capability to associate
timestamps with statistics. Additionally, I have made some minor fixes and
improved
the indentation.
Currently, I have set STALE_STATS_LIMIT to 0.5 seconds in code. which means
do not
do not wait for newer statistics if previous statistics were published
within the last
5 seconds of current request.
Inshort, there are following options to design the wait for statistics
depending on whether
we expect concurrent requests to a backend for memory statistics to be
common.
1. Always get the latest statistics and timeout if not able to.
This works fine for sequential probing which is going to be the most common
use case.
This can lead to a backend timeouts upto MAX_TRIES * MEMSTATS_WAIT_TIMEOUT.
2. Determine the appropriate STALE_STATS_LIMIT and not wait for the latest
stats if
previous statistics are within that limit .
This will help avoid the timeouts in case of the concurrent requests.
3. Do what v10 patch on this thread does -
Wait for the latest statistics for up to MEMSTATS_WAIT_TIMEOUT;
otherwise, display the previous statistics, regardless of when they were
published.
Since timeouts are likely to occur only during concurrent requests, the
displayed
statistics are unlikely to be very outdated.
However, in this scenario, we observe the behavior you mentioned, i.e.,
concurrent
backends can get stuck for the duration of MEMSTATS_WAIT_TIMEOUT
(currently 5 seconds as per the current settings).
I am inclined toward the third approach, as concurrent requests are not
expected
to be a common use case for this feature. Moreover, with the second
approach,
determining an appropriate value for STALE_STATS_LIMIT is challenging, as
it
depends on the server's load.
Kindly let me know your preference. I have attached a patch which
implements the
2nd approach for testing, the 3rd approach being implemented in the v10
patch.
Thank you,
Rahila Syed
Attachments:
v11-0001-Function-to-report-memory-context-stats-of-any-backe.patchapplication/octet-stream; name=v11-0001-Function-to-report-memory-context-stats-of-any-backe.patchDownload
From e21fb5114a00b632edbba42e7a9c36cdc6e3d84f Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sun, 15 Sep 2024 17:56:06 +0530
Subject: [PATCH] Function to report memory context stats of any backend
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
sets a flag, which causes the relevant backend to copy its
MemoryContextStats to a DSA, as part
of next CHECK_FOR_INTERRUPTS().
It there are more that 16MB worth of statistics, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend
then wakes up, reads the shared memory and
returns these values in the form of set of records,
one for each memory context, to the user, followed
by a cumulative total of the remaining contexts,
if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
Each backend and auxiliary process has its own slot
for reporting the stats. There is an array of such
memory slots of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the
publishing process and the client backend.
---
doc/src/sgml/func.sgml | 31 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 406 +++++++++++++++--
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 427 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 58 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 956 insertions(+), 42 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581a..c56223e28a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28358,6 +28358,37 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes two
+ arguments: PID and a boolean, get_summary. The function can send
+ requests to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics.
+
+ When get_summary is set to true, statistics for memory contexts at
+ levels 1 and 2 are displayed, with level 1 representing the root node
+ (i.e., TopMemoryContext). Each level 2 context's statistics represent
+ an aggregate of all its child contexts' statistics, with
+ num_agg_contexts indicating the number of these aggregated child
+ contexts.
+
+ When get_summary is set to false, the num_agg_contexts value is 1,
+ indicating that individual statistics are being displayed.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 0ab921a169..0c693cfa48 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -778,6 +778,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 9bfd0fd665..ee8360ad6f 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -616,6 +616,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a62..cd1ecb6b93 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 59d213031b..d670954c4e 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ffbf043935..b1a5e86a85 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..4a70eabf7f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625..e425b9eeb0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5655348a2e..70587771d3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3497,6 +3497,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0b53cba807..68a1769967 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -158,6 +158,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..edcda880a6 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -71,7 +68,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -143,24 +140,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = AssignContextType(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +155,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+AssignContextType(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -249,7 +255,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
/*
* pg_log_backend_memory_contexts
- * Signal a backend or an auxiliary process to log its memory contexts.
+ * Signal a backend or an auxiliary process to log its memory contexts.
*
* By default, only superusers are allowed to signal to log the memory
* contexts because allowing any users to issue this request at an unbounded
@@ -305,3 +311,341 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers are allowed to signal to return the memory
+ * contexts because allowing any users to issue this request at an unbounded
+ * rate would cause lots of requests to be sent and which can lead to denial of
+ * service. Additional roles can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - it the process has too many
+ * memory contexts, the memory contexts into that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable.
+ * Once condition variable comes out of sleep, check if the memory context
+ * information is available for read and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for MAX_RETRIES
+ * number of times before giving up and returning without statistics.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextEntry *memctx_info;
+ MemoryContext oldContext;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ oldContext = MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche,
+ DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ MAX_NUM_DEFAULT_SEGMENTS *
+ DSA_DEFAULT_INIT_SEGMENT_SIZE);
+
+ MemoryContextSwitchTo(oldContext);
+
+ handle = dsa_get_handle(area);
+
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+
+ /*
+ * Pin the dsa area even if the creating backend exits, this is to
+ * make sure the area remains attachable even if current client exits
+ */
+ dsa_pin(area);
+ }
+ else
+ {
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ /* dsa_pin_mapping(area); */
+ }
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to the auxiliary process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+
+ goto end;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend. A dsa pointer could be valid if statitics have
+ * previously been published by the backend. In which case, check if
+ * statistics are not older than STALE_STATS_LIMIT, if they are wait
+ * for newer statistics.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(memCtxState[procNumber].stats_timestamp,
+ curr_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+#define STALE_STATS_LIMIT 500
+ if ((memCtxState[procNumber].proc_id == pid))
+ {
+ /*
+ * Break if the report is not older than STALE_STATS_LIMIT secs.
+ * No need to wait for a more recent statistics, as such waits
+ * could cause timeouts in case of concurrent requests.
+ */
+ elog(LOG, "msecs value %ld", msecs);
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs < STALE_STATS_LIMIT)
+ break;
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 5000
+#define MAX_RETRIES 20
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ goto end;
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid)));
+ if (num_retries > MAX_RETRIES)
+ goto end;
+ num_retries = num_retries + 1;
+ }
+
+ }
+
+ /* We should land here only with a valid memstats_dsa_pointer */
+ Assert(DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer));
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ for (i = 0; i < memCtxState[procNumber].in_memory_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path,
+ path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+ values[10] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_contexts);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+
+end:
+ dsa_detach(area);
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ *
+ * XXX Should this check IsUnderPostmaster, similarly to e.g. CommitTsShmemInit?
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!found)
+ {
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_stats_reporting");
+
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d035..fefc4690e8 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -135,6 +141,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,10 +179,20 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static void compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +858,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +911,42 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +966,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +999,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1336,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1388,336 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, so that the
+ * parents get a chance to report stats before their children.
+ *
+ * Statistics for individual contexts are shared via dynamic shared memory.
+ * The statistics for contexts that do not fit in the allocated size of the DSA,
+ * are captured as a cumulative total.
+ *
+ * If get_summary is true, we traversse the memory context tree recursively to
+ * cover all the children of a parent context to be able to display a cumulative
+ * total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+ dsa_area *area;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ max_stats = floor(MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE
+ / sizeof(MemoryContextEntry));
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top.
+ */
+ compute_num_of_contexts(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+
+ memCtxState[idx].proc_id = MyProcPid;
+
+ /* Free the memory allocated previously by the same process. */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextEntry));
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL, &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat, 1);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals, PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path, grand_totals, num_contexts);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].in_memory_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write total of the remaining statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].in_memory_stats = context_id + 1;
+ strncpy(meminfo[max_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].in_memory_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_contexts = context_id - memCtxState[idx].in_memory_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, inorder to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/* Return the number of contexts allocated currently by the backend */
+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa buffer */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name,
+ clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident,
+ clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ memctx_info[curr_id].type = AssignContextType(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_contexts = num_contexts;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 18560755d2..88b8fe555a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8455,6 +8455,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index d016a9c924..fc75ea143c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..477ab99338 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..8982f13893 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,54 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for memory context statistics reporting */
+typedef struct MemoryContextEntry
+{
+ /*
+ * XXX isn't 2 x 1kB for every context a bit too much? Maybe better to
+ * make it variable-length?
+ */
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_contexts;
+} MemoryContextEntry;
+
+/* Shared memory state for memory context statistics reporting */
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int in_memory_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState *memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *AssignContextType(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 352abc0bd4..831e0dead1 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -231,3 +231,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer'
+ INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..0a4cc3bf4d 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer'
+ INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d5aa5c295a..c59db4387b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1632,8 +1632,10 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 2025/01/21 20:27, Rahila Syed wrote:
Hi Tomas,
I've tried the pgbench test
again, to see if it gets stuck somewhere, and I'm observing this on a
new / idle cluster:$ pgbench -n -f test.sql -P 1 test -T 60
pgbench (18devel)
progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
...There's always a period of 10-15 seconds when everything seems to be
working fine, and then a couple seconds when it gets stuck, with the usualLOG: Wait for 69454 process to publish stats timed out, trying again
The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
of that are processes that should be handling the signal, so how come it
gets stuck every now and then? The system is entirely idle, there's no
contention for the shmem stuff, etc. Could it be forgetting about the
signal in some cases, or something like that?Yes, This occurs when, due to concurrent signals received by a backend,
both signals are processed together, and stats are published only once.
Once the stats are read by the first client that gains access, they are erased,
causing the second client to wait until timeout.If we make clients wait for the latest stats, timeouts may occur during concurrent
operations. To avoid such timeouts, we can retain the previously published memory
statistics for every backend and avoid waiting for the latest statistics when the
previous statistics are newer than STALE_STATS_LIMIT. This limit can be determined
based on the server load and how fast the memory statistics requests are being
handled by the server.For example, on a server running make -j 4 installcheck-world while concurrently
probing client backends for memory statistics using pgbench, accepting statistics
that were approximately 1 second old helped eliminate timeouts. Conversely, on an
idle system, waiting for new statistics when the previous ones were older than 0.1
seconds was sufficient to avoid any timeouts caused by concurrent requests.PFA an updated and rebased patch that includes the capability to associate
timestamps with statistics. Additionally, I have made some minor fixes and improved
the indentation.Currently, I have set STALE_STATS_LIMIT to 0.5 seconds in code. which means do not
do not wait for newer statistics if previous statistics were published within the last
5 seconds of current request.Inshort, there are following options to design the wait for statistics depending on whether
we expect concurrent requests to a backend for memory statistics to be common.1. Always get the latest statistics and timeout if not able to.
This works fine for sequential probing which is going to be the most common use case.
This can lead to a backend timeouts upto MAX_TRIES * MEMSTATS_WAIT_TIMEOUT.2. Determine the appropriate STALE_STATS_LIMIT and not wait for the latest stats if
previous statistics are within that limit .
This will help avoid the timeouts in case of the concurrent requests.3. Do what v10 patch on this thread does -
Wait for the latest statistics for up to MEMSTATS_WAIT_TIMEOUT;
otherwise, display the previous statistics, regardless of when they were published.Since timeouts are likely to occur only during concurrent requests, the displayed
statistics are unlikely to be very outdated.
However, in this scenario, we observe the behavior you mentioned, i.e., concurrent
backends can get stuck for the duration of MEMSTATS_WAIT_TIMEOUT
(currently 5 seconds as per the current settings).I am inclined toward the third approach, as concurrent requests are not expected
to be a common use case for this feature. Moreover, with the second approach,
determining an appropriate value for STALE_STATS_LIMIT is challenging, as it
depends on the server's load.
Just idea; as an another option, how about blocking new requests to
the target process (e.g., causing them to fail with an error or
returning NULL with a warning) if a previous request is still pending?
Users can simply retry the request if it fails. IMO failing quickly
seems preferable to getting stuck for a while in cases with concurrent
requests.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi,
Just idea; as an another option, how about blocking new requests to
the target process (e.g., causing them to fail with an error or
returning NULL with a warning) if a previous request is still pending?
Users can simply retry the request if it fails. IMO failing quickly
seems preferable to getting stuck for a while in cases with concurrent
requests.Thank you for the suggestion. I agree that it is better to fail early
and avoid
waiting for a timeout in such cases. I will add a "pending request" tracker
for
this in shared memory. This approach will help prevent sending a concurrent
request if a request for the same backend is still being processed.
IMO, one downside of throwing an error in such cases is that the users
might
wonder if they need to take a corrective action, even though the issue is
actually
going to solve itself and they just need to retry. Therefore, issuing a
warning
or displaying previously updated statistics might be a better alternative
to throwing
an error.
Thank you,
Rahila Syed
On 1/24/25 14:47, Rahila Syed wrote:
Hi,
Just idea; as an another option, how about blocking new requests to
the target process (e.g., causing them to fail with an error or
returning NULL with a warning) if a previous request is still pending?
Users can simply retry the request if it fails. IMO failing quickly
seems preferable to getting stuck for a while in cases with concurrent
requests.Thank you for the suggestion. I agree that it is better to fail
early and avoid waiting for a timeout in such cases. I will add a
"pending request" tracker for this in shared memory. This approach
will help prevent sending a concurrent request if a request for the
same backend is still being processed.
AFAIK these failures should be extremely rare - we're only talking about
that because the workload I used for testing is highly concurrent, i.e.
it requests memory context info extremely often. I doubt anyone sane is
going to do that in practice ...
IMO, one downside of throwing an error in such cases is that the
users might wonder if they need to take a corrective action, even
though the issue is actually going to solve itself and they just
need to retry. Therefore, issuing a warning or displaying previously
updated statistics might be a better alternative to throwing an
error.
Wouldn't this be mostly mitigated by adding proper detail/hint to the
error message? Sure, the user can always ignore that (especially when
calling this from a script), but well ... we can only do so much.
All this makes me think about how we shared pgstat data before the shmem
approach was introduced in PG15. Until then the process signaled pgstat
collector, and the collector wrote the statistics into a file, with a
timestamp. And the process used the timestamp to decide if it's fresh
enough ... Wouldn't the same approach work here?
I imagined it would work something like this:
requesting backend:
-------------------
* set request_ts to current timestamp
* signal the target process, to generate memory context info
* wait until the DSA gets filled with stats_ts > request_ts
* return the data, don't erase anything
target backend
--------------
* clear the signal
* generate the statistics
* set stats_ts to current timestamp
* wait all the backends waiting for the stats (through CV)
I see v11 does almost this, except that it accepts somewhat stale data.
But why would that be necessary? I don't think it's needed, and I don't
think we should accept data from before the process sends the signal.
regards
--
Tomas Vondra
Hi,
On Sat, Jan 25, 2025 at 3:50 AM Tomas Vondra <tomas@vondra.me> wrote:
On 1/24/25 14:47, Rahila Syed wrote:
Hi,
Just idea; as an another option, how about blocking new requests to
the target process (e.g., causing them to fail with an error or
returning NULL with a warning) if a previous request is stillpending?
Users can simply retry the request if it fails. IMO failing quickly
seems preferable to getting stuck for a while in cases withconcurrent
requests.
Thank you for the suggestion. I agree that it is better to fail
early and avoid waiting for a timeout in such cases. I will add a
"pending request" tracker for this in shared memory. This approach
will help prevent sending a concurrent request if a request for the
same backend is still being processed.AFAIK these failures should be extremely rare - we're only talking about
that because the workload I used for testing is highly concurrent, i.e.
it requests memory context info extremely often. I doubt anyone sane is
going to do that in practice ...
Yes, that makes sense.
IMO, one downside of throwing an error in such cases is that the
users might wonder if they need to take a corrective action, even
though the issue is actually going to solve itself and they just
need to retry. Therefore, issuing a warning or displaying previously
updated statistics might be a better alternative to throwing an
error.Wouldn't this be mostly mitigated by adding proper detail/hint to the
error message? Sure, the user can always ignore that (especially when
calling this from a script), but well ... we can only do so much.
OK.
All this makes me think about how we shared pgstat data before the shmem
approach was introduced in PG15. Until then the process signaled pgstat
collector, and the collector wrote the statistics into a file, with a
timestamp. And the process used the timestamp to decide if it's fresh
enough ... Wouldn't the same approach work here?I imagined it would work something like this:
requesting backend:
-------------------
* set request_ts to current timestamp
* signal the target process, to generate memory context info
* wait until the DSA gets filled with stats_ts > request_ts
* return the data, don't erase anythingtarget backend
--------------
* clear the signal
* generate the statistics
* set stats_ts to current timestamp
* wait all the backends waiting for the stats (through CV)I see v11 does almost this, except that it accepts somewhat stale data.
That's correct.
But why would that be necessary? I don't think it's needed, and I don't
think we should accept data from before the process sends the signal.This is done in an attempt to avoid concurrent requests from timing out.
In such cases, data in response to another request is likely to already be
in the
dynamic shared memory. Hence instead of waiting for the latest data and
risking a
timeout, the approach displays available statistics that are newer than a
defined
threshold. Additionally, since we can't distinguish between sequential and
concurrent requests, we accept somewhat stale data for all requests.
I realize this approach has some issues, mainly regarding how to determine
an appropriate threshold value or a limit for old data.
Therefore, I agree that it makes sense to display the data that is
published
after the request is made. If such data can't be published due to
concurrent
requests or other delays, the function should detect this and return as
soon as
possible.
Thank you,
Rahila Syed
Hi,
Just idea; as an another option, how about blocking new requests to
the target process (e.g., causing them to fail with an error or
returning NULL with a warning) if a previous request is stillpending?
Users can simply retry the request if it fails. IMO failing quickly
seems preferable to getting stuck for a while in cases withconcurrent
requests.
Thank you for the suggestion. I agree that it is better to fail
early and avoid waiting for a timeout in such cases. I will add a
"pending request" tracker for this in shared memory. This approach
will help prevent sending a concurrent request if a request for the
same backend is still being processed.
Please find attached a patch that adds a request_pending field in
shared memory. This allows us to detect concurrent requests early
and return a WARNING message immediately, avoiding unnecessary
waiting and potential timeouts. This is added in v12-0002* patch.
I imagined it would work something like this:
requesting backend:
-------------------
* set request_ts to current timestamp
* signal the target process, to generate memory context info
* wait until the DSA gets filled with stats_ts > request_ts
* return the data, don't erase anythingtarget backend
--------------
* clear the signal
* generate the statistics
* set stats_ts to current timestamp
* wait all the backends waiting for the stats (through CV)
The attached v12-0002* patch implements this. We determine
the latest statistics based on the stats timestamp, if it is greater
than the timestamp when the request was sent, the statistics are
considered up to date and are returned immediately. Otherwise,
the client waits for the latest statistics to be published until the
timeout is reached.
With the latest changes, I don't see a dip in tps even when
concurrent requests are run in pgbench script.
pgbench -n -f monitoring.sql -P 1 postgres -T 60
pgbench (18devel)
progress: 1.0 s, 816.9 tps, lat 1.218 ms stddev 0.317, 0 failed
progress: 2.0 s, 821.9 tps, lat 1.216 ms stddev 0.177, 0 failed
progress: 3.0 s, 817.1 tps, lat 1.224 ms stddev 0.209, 0 failed
progress: 4.0 s, 791.0 tps, lat 1.262 ms stddev 0.292, 0 failed
progress: 5.0 s, 780.8 tps, lat 1.280 ms stddev 0.326, 0 failed
progress: 6.0 s, 675.2 tps, lat 1.482 ms stddev 0.503, 0 failed
progress: 7.0 s, 674.0 tps, lat 1.482 ms stddev 0.387, 0 failed
progress: 8.0 s, 821.0 tps, lat 1.217 ms stddev 0.272, 0 failed
progress: 9.0 s, 903.0 tps, lat 1.108 ms stddev 0.196, 0 failed
progress: 10.0 s, 886.9 tps, lat 1.128 ms stddev 0.160, 0 failed
progress: 11.0 s, 887.1 tps, lat 1.126 ms stddev 0.243, 0 failed
progress: 12.0 s, 871.0 tps, lat 1.147 ms stddev 0.227, 0 failed
progress: 13.0 s, 735.0 tps, lat 1.361 ms stddev 0.329, 0 failed
progress: 14.0 s, 655.9 tps, lat 1.522 ms stddev 0.331, 0 failed
progress: 15.0 s, 674.0 tps, lat 1.484 ms stddev 0.254, 0 failed
progress: 16.0 s, 659.0 tps, lat 1.517 ms stddev 0.289, 0 failed
progress: 17.0 s, 641.0 tps, lat 1.558 ms stddev 0.281, 0 failed
progress: 18.0 s, 707.8 tps, lat 1.412 ms stddev 0.324, 0 failed
progress: 19.0 s, 746.3 tps, lat 1.341 ms stddev 0.219, 0 failed
progress: 20.0 s, 659.9 tps, lat 1.513 ms stddev 0.372, 0 failed
progress: 21.0 s, 651.8 tps, lat 1.533 ms stddev 0.372, 0 failed
WARNING: cannot process the request at the moment
HINT: Another request is pending, try again
progress: 22.0 s, 635.2 tps, lat 1.574 ms stddev 0.519, 0 failed
WARNING: cannot process the request at the moment
HINT: Another request is pending, try again
progress: 23.0 s, 730.0 tps, lat 1.369 ms stddev 0.408, 0 failed
WARNING: cannot process the request at the moment
HINT: Another request is pending, try again
WARNING: cannot process the request at the moment
HINT: Another request is pending, try again
where monitoring.sql is as follows:
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
ORDER BY random() LIMIT 1)
, false);
I have split the patch into 2 patches with v12-0001* consisting of fixes
needed to allow using the MemoryContextStatsInternals for this
feature and
v12-0002* containing all the remaining changes for the feature.
A few outstanding issues are as follows:
1. Currently one DSA is created per backend when the first request for
statistics is made and remains for the lifetime of the server.
I think I should add logic to periodically destroy DSAs, when memory
context statistics are not being *actively* queried from the backend,
as determined by the statistics timestamp.
2. The two issues reported by Fujii-san here: [1]. /messages/by-id/a1a7e2b7-8f33-4313-baff-42e92ec14fd3@oss.nttdata.com.
i. I have proposed a fix for the first issue here [2]. /messages/by-id/CAH2L28shr0j3JE5V3CXDFmDH-agTSnh2V8pR23X0UhRMbDQD9Q@mail.gmail.com.
ii. I am able to reproduce the second issue. This happens when we try
to query statistics of a backend running infinite_recurse.sql. While I am
working on finding a root-cause, I think it happens due to some memory
being overwritten due to to stack-depth violation, as the issue is not seen
when I reduce the max_stack_depth to 100kb.
[1]: . /messages/by-id/a1a7e2b7-8f33-4313-baff-42e92ec14fd3@oss.nttdata.com
/messages/by-id/a1a7e2b7-8f33-4313-baff-42e92ec14fd3@oss.nttdata.com
[2]: . /messages/by-id/CAH2L28shr0j3JE5V3CXDFmDH-agTSnh2V8pR23X0UhRMbDQD9Q@mail.gmail.com
/messages/by-id/CAH2L28shr0j3JE5V3CXDFmDH-agTSnh2V8pR23X0UhRMbDQD9Q@mail.gmail.com
Attachments:
v12-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v12-0002-Function-to-report-memory-context-statistics.patchDownload
From 1e174f0dd888d9b89ecefd593cba648db1462086 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a DSA, which contains the stats to be shared by the
corresponding process. Thus 1 DSA area is created per backend
that is publishing the statistics.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing process
and the client backend.
---
doc/src/sgml/func.sgml | 31 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 2 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 425 ++++++++++++++++--
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 383 +++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 59 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 942 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7efc81936a..9d243df4e1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28409,6 +28409,37 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes two
+ arguments: PID and a boolean, get_summary. The function can send
+ requests to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics.
+
+ When get_summary is set to true, statistics for memory contexts at
+ levels 1 and 2 are displayed, with level 1 representing the root node
+ (i.e., TopMemoryContext). Each level 2 context's statistics represent
+ an aggregate of all its child contexts' statistics, with
+ num_agg_contexts indicating the number of these aggregated child
+ contexts.
+
+ When get_summary is set to false, the num_agg_contexts value is 1,
+ indicating that individual statistics are being displayed.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 0ab921a169..0c693cfa48 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -778,6 +778,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index b94f9cdff2..33c3c2d9c6 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -661,6 +661,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a62..cd1ecb6b93 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 59d213031b..d670954c4e 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ffbf043935..b1a5e86a85 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..4a70eabf7f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,7 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625..e425b9eeb0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5655348a2e..70587771d3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3497,6 +3497,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f07162..3674b5b7b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..7c3f9a4f68 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextState *memCtxState = NULL;
/*
* int_list_to_array
@@ -71,7 +68,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -143,24 +140,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = AssignContextType(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +155,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+AssignContextType(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -249,7 +255,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
/*
* pg_log_backend_memory_contexts
- * Signal a backend or an auxiliary process to log its memory contexts.
+ * Signal a backend or an auxiliary process to log its memory contexts.
*
* By default, only superusers are allowed to signal to log the memory
* contexts because allowing any users to issue this request at an unbounded
@@ -305,3 +311,360 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers are allowed to signal to return the memory
+ * contexts because allowing any users to issue this request at an unbounded
+ * rate would cause lots of requests to be sent and which can lead to denial of
+ * service. Additional roles can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - it the process has too many
+ * memory contexts, the memory contexts into that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable.
+ * Once condition variable comes out of sleep, check if the memory context
+ * information is available for read and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for MAX_RETRIES
+ * number of times before giving up and returning without statistics.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ dsa_handle handle;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Check if the another request is not yet addressed by the process as
+ * that may result in the current request to timeout.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ if (memCtxState[procNumber].request_pending == true)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ ereport(WARNING,
+ (errmsg("cannot process the request at the moment"),
+ errhint("Another request is pending, try again")));
+ PG_RETURN_NULL();
+ }
+ memCtxState[procNumber].request_pending = true;
+ memCtxState[procNumber].get_summary = get_summary;
+
+ /*
+ * Create a DSA segment with maximum size of 16MB, send handle to the
+ * publishing process for storing the stats. If number of contexts exceed
+ * 16MB, a cumulative total is stored for such contexts.
+ */
+ if (memCtxState[procNumber].memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ area = dsa_create_ext(memCtxState[procNumber].lw_lock.tranche,
+ DSA_DEFAULT_INIT_SEGMENT_SIZE,
+ MAX_NUM_DEFAULT_SEGMENTS *
+ DSA_DEFAULT_INIT_SEGMENT_SIZE);
+ handle = dsa_get_handle(area);
+
+ /*
+ * Pin the dsa area even if the creating backend exits, this is to
+ * make sure the area remains attachable even if current client exits
+ */
+ dsa_pin(area);
+ /* Set the handle in shared memory */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].memstats_dsa_handle = handle;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ }
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ area = dsa_attach(memCtxState[procNumber].memstats_dsa_handle);
+ }
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to the auxiliary process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+
+ goto end;
+ }
+
+ /*
+ * Wait for a backend to publish stats, indicated by a valid dsa pointer
+ * set by the backend. A dsa pointer could be valid if statitics have
+ * previously been published by the backend. In which case, check if
+ * statistics are not older than curr_timestamp, if they are wait for
+ * newer statistics.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 5000
+#define MAX_RETRIES 20
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ goto end;
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid)));
+ if (num_retries > MAX_RETRIES)
+ goto end;
+ num_retries = num_retries + 1;
+ }
+
+ }
+
+ /* We should land here only with a valid memstats_dsa_pointer */
+ Assert(DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer));
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ for (i = 0; i < memCtxState[procNumber].num_individual_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (strlen(memctx_info[i].name) != 0)
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ else
+ nulls[0] = true;
+ if (strlen(memctx_info[i].ident) != 0)
+ values[1] = CStringGetTextDatum(memctx_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+
+ path_length = memctx_info[i].path_length;
+ path_array = construct_array_builtin(memctx_info[i].path,
+ path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[10] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+
+ values[0] = CStringGetTextDatum(memctx_info[i].name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+
+end:
+ dsa_detach(area);
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextState *) ShmemInitStruct("MemoryContextState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_stats_reporting");
+
+ memCtxState[i].memstats_dsa_handle = DSA_HANDLE_INVALID;
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ memCtxState[i].request_pending = false;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 946a3731fd..aa030bddaf 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -164,6 +170,7 @@ MemoryContext CacheMemoryContext = NULL;
MemoryContext MessageContext = NULL;
MemoryContext TopTransactionContext = NULL;
MemoryContext CurTransactionContext = NULL;
+static dsa_area *area = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
@@ -177,6 +184,16 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static void compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+
/*
* You should not do memory allocations within a critical section, because
@@ -1321,6 +1338,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1390,355 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, so that the
+ * parents get a chance to report stats before their children.
+ *
+ * Statistics for individual contexts are shared via dynamic shared memory.
+ * The statistics for contexts that do not fit in the allocated size of the DSA,
+ * are captured as a cumulative total.
+ *
+ * If get_summary is true, we traversse the memory context tree recursively to
+ * cover all the children of a parent context to be able to display a cumulative
+ * total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ /* dsa_area *area = NULL; */
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ check_stack_depth();
+ PublishMemoryContextPending = false;
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].request_pending = false;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the DSM seg */
+ max_stats = (MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / sizeof(MemoryContextEntry);
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top.
+ */
+ compute_num_of_contexts(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit in the DSA
+ * segment, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ /* Attach to DSA segment */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxState[idx].memstats_dsa_handle);
+ dsa_pin_mapping(area);
+ MemoryContextSwitchTo(oldcontext);
+ }
+ memCtxState[idx].proc_id = MyProcPid;
+
+ /* Free the memory allocated previously by the same process. */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat, 1);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].num_individual_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write aggregate of the remaining
+ * statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].num_individual_stats = context_id + 1;
+ strncpy(meminfo[max_stats - 1].name, "Remaining Totals", 16);
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].num_individual_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ memCtxState[idx].num_individual_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+/* dsa_detach(area); */
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, inorder to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/* Return the number of contexts allocated currently by the backend */
+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa buffer */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ strncpy(memctx_info[curr_id].name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ strncpy(memctx_info[curr_id].name,
+ clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident[0] = '\0';
+ }
+ else
+ strncpy(memctx_info[curr_id].ident,
+ clipped_ident, strlen(clipped_ident));
+ }
+ else
+ memctx_info[curr_id].ident[0] = '\0';
+
+ memctx_info[curr_id].path_length = list_length(path);
+ foreach_int(i, path)
+ memctx_info[curr_id].path[foreach_current_index(i)] = Int32GetDatum(i);
+
+ memctx_info[curr_id].type = AssignContextType(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a5..464eb7258d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8474,6 +8474,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,_int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index a2b63495ee..3dc3dcfb6c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..477ab99338 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..efa40a14af 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,55 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for memory context statistics reporting */
+typedef struct MemoryContextEntry
+{
+ /*
+ * XXX isn't 2 x 1kB for every context a bit too much? Maybe better to
+ * make it variable-length?
+ */
+ char name[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ Datum path[MEM_CONTEXT_MAX_LEVEL];
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/* Shared memory state for memory context statistics reporting */
+typedef struct MemoryContextState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int num_individual_stats;
+ int total_stats;
+ bool get_summary;
+ dsa_handle memstats_dsa_handle;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+ bool request_pending;
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextState *memCtxState;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *AssignContextType(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 352abc0bd4..831e0dead1 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -231,3 +231,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer'
+ INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..0a4cc3bf4d 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer'
+ INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9a3bee93de..69089e03e5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1632,8 +1632,10 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v12-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v12-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From a1d0e1fe39cc59983396555f04cc879ff460340d Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d035..946a3731fd 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
On 2025-02-03 21:47, Rahila Syed wrote:
Hi,
Just idea; as an another option, how about blocking new
requests to
the target process (e.g., causing them to fail with an error
or
returning NULL with a warning) if a previous request is still
pending?
Users can simply retry the request if it fails. IMO failing
quickly
seems preferable to getting stuck for a while in cases with
concurrent
requests.
Thank you for the suggestion. I agree that it is better to fail
early and avoid waiting for a timeout in such cases. I will add a
"pending request" tracker for this in shared memory. This approachwill help prevent sending a concurrent request if a request for
the
same backend is still being processed.
Please find attached a patch that adds a request_pending field in
shared memory. This allows us to detect concurrent requests early
and return a WARNING message immediately, avoiding unnecessary
waiting and potential timeouts. This is added in v12-0002* patch.
Thanks for updating the patch!
The below comments would be a bit too detailed at this stage, but I’d
like to share the points I noticed.
76 + arguments: PID and a boolean, get_summary. The function
can send
Since get_summary is a parameter, should we enclose it in <parameter>
tags, like <parameter>get_summary</parameter>?
387 + * The shared memory buffer has a limited size - it the process
has too many
388 + * memory contexts,
Should 'it' be 'if'?
320 * By default, only superusers are allowed to signal to return the
memory
321 * contexts because allowing any users to issue this request at an
unbounded
322 * rate would cause lots of requests to be sent and which can lead
to denial of
323 * service. Additional roles can be permitted with GRANT.
This comment seems to contradict the following code:
360 * Only superusers or users with pg_read_all_stats privileges
can view the
361 * memory context statistics of another process
362 */
363 if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
364 ereport(ERROR,
365 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
366 errmsg("memory context statistics privilege
error")));
485 + if (memCtxState[procNumber].memstats_dsa_handle ==
DSA_HANDLE_INVALID)
486 + {
487 +
488 + LWLockRelease(&memCtxState[procNumber].lw_lock);
505 + else
506 + {
507 + LWLockRelease(&memCtxState[procNumber].lw_lock);
The LWLockRelease() function appears in both the if and else branches.
Can we move it outside the conditional block to avoid duplication?
486 + {
487 +
488 + LWLockRelease(&memCtxState[procNumber].lw_lock);
The blank line at 487 seems unnecessary. Should we remove it?
534 {
535 ereport(LOG,
536 (errmsg("Wait for %d process to publish stats
timed out, trying again",
537 pid)));
538 if (num_retries > MAX_RETRIES)
539 goto end;
540 num_retries = num_retries + 1;
541 }
If the target process remains unresponsive, the logs will repeatedly
show:
LOG: Wait for xxxx process to publish stats timed out, trying again
LOG: Wait for xxxx process to publish stats timed out, trying again
...
LOG: Wait for xxxx process to publish stats timed out, trying again
However, the final log message is misleading because it does not
actually try again. Should we adjust the last log message to reflect the
correct behavior?
541 }
542
543 }
The blank line at 542 seems unnecessary. Should we remove it?
874 + context_id_lookup =
hash_create("pg_get_remote_backend_memory_contexts",
Should 'pg_get_remote_backend_memory_contexts' be renamed to
'pg_get_process_memory_contexts' now?
899 + * Allocate memory in this process's dsa for storing statistics
of the the
'the the' is a duplicate.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Hi,
Thanks for updating the patch!
The below comments would be a bit too detailed at this stage, but I’d
like to share the points I noticed.Thanks for sharing the detailed comments. I have incorporated some of them
into the new version of the patch. I will include the rest when I refine and
comment the code further.
Meanwhile, I have fixed the following outstanding issues:
1. Currently one DSA is created per backend when the first request for
statistics is made and remains for the lifetime of the server.
I think I should add logic to periodically destroy DSAs, when memory
context statistics are not being *actively* queried from the backend,
as determined by the statistics timestamp.
After an offline discussion with Andres and Tomas, I have fixed this to use
only one DSA for all the publishing backends/processes. Each backend
allocates smaller chunks of memory within the DSA while publishing
statistics.
These chunks are tracked independently by each backend, ensuring that two
publishing backends/processes do not block each other despite using the
same
DSA. This approach eliminates the overhead of creating multiple DSAs,
one for each backend.
I am not destroying the DSA area because it stores the previously published
statistics for each process. This allows the system to display older
statistics
when the latest data cannot be retrieved within a reasonable time.
Only the most recently updated statistics are kept, while all earlier ones
are freed using dsa_free by each backend when they are no longer needed.
.
2. The two issues reported by Fujii-san here: [1].
i. I have proposed a fix for the first issue here [2].
ii. I am able to reproduce the second issue. This happens when we try
to query statistics of a backend running infinite_recurse.sql. While I am
working on finding a root-cause, I think it happens due to some memory
being overwritten due to to stack-depth violation, as the issue is not
seen
when I reduce the max_stack_depth to 100kb.
}
}
The second issue is also resolved by using smaller allocations within a
DSA.
Previously, it occurred because a few statically allocated strings were
placed
within a single large chunk of DSA allocation. I have changed this to use
dynamically allocated chunks with dsa_allocate0 within the same DSA.
Please find attached updated and rebased patches.
Thank you,
Rahila Syed
Attachments:
v13-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v13-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From 0a394b68cf1cc6f30ebd67c9eb094f20a7b56c41 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d035..946a3731fd 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
v13-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v13-0002-Function-to-report-memory-context-statistics.patchDownload
From fd7332c374d0f22a5e6a0a5fd5da0b9244de690c Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 41 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 455 ++++++++++++++--
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 489 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 74 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 1104 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7efc81936a..e70f95017b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28409,6 +28409,47 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes three
+ arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics.
+
+ When <parameter>get_summary</parameter> is set to true, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
+
+ When <parameter>get_summary</parameter> is set to false, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ade2708b59..a227b5e89f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -779,6 +779,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index b94f9cdff2..33c3c2d9c6 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -661,6 +661,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a62..cd1ecb6b93 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 59d213031b..d670954c4e 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ffbf043935..b1a5e86a85 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..5eee04d52a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,8 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625..e425b9eeb0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1149d89d7a..6d0b910660 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3497,6 +3497,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f07162..3674b5b7b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..06efbc6b5d 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
-#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -71,7 +69,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
TupleDesc tupdesc, MemoryContext context,
HTAB *context_id_lookup)
{
-#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 10
+#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS 11
Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = AssignContextType(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+AssignContextType(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -249,7 +256,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
/*
* pg_log_backend_memory_contexts
- * Signal a backend or an auxiliary process to log its memory contexts.
+ * Signal a backend or an auxiliary process to log its memory contexts.
*
* By default, only superusers are allowed to signal to log the memory
* contexts because allowing any users to issue this request at an unbounded
@@ -305,3 +312,389 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal to return the memory contexts because allowing any users to issue
+ * this request at an unbounded rate would cause lots of requests to be sent
+ * and which can lead to denial of service. Additional roles can be permitted
+ * with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - it the process has too many
+ * memory contexts, the memory contexts into that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable.
+ * Once condition variable comes out of sleep, check if the memory context
+ * information is available for read and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+ bool prev_stats = false;
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a postgresql process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+
+ goto end;
+ }
+
+ /*
+ * Wait for a postgresql process to publish stats, indicated by a valid
+ * dsa pointer set by the backend. A dsa pointer could be valid if
+ * statitics have previously been published by the backend. In which case,
+ * check if statistics are not older than curr_timestamp, if they are wait
+ * for newer statistics. Wait for max_tries * MEMSTATS_WAIT_TIMEOUT,
+ * following which display older statistics if available.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ goto end;
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid)));
+
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_prev_dsa_pointer))
+ {
+ prev_stats = true;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ goto end;
+ }
+ }
+ num_retries = num_retries + 1;
+ }
+
+ }
+ /* XXX. Check if this lock is required */
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+ /* Assert for dsa_handle to be valid */
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ /* We should land here only with a valid memstats_dsa_pointer */
+
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ if (prev_stats == true)
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_prev_dsa_pointer);
+ else
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+ for (i = 0; i < memCtxState[procNumber].num_individual_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum_array;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+
+ path_length = memctx_info[i].path_length;
+
+ path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
+ path_array = construct_array_builtin(path_datum_array,
+ path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[10] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
+ char *name;
+
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+end:
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ memCtxState[i].memstats_prev_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory
+ * context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 946a3731fd..b6a5f148d0 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -164,6 +170,7 @@ MemoryContext CacheMemoryContext = NULL;
MemoryContext MessageContext = NULL;
MemoryContext TopTransactionContext = NULL;
MemoryContext CurTransactionContext = NULL;
+static dsa_area *area = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
@@ -177,6 +184,17 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
+
/*
* You should not do memory allocations within a critical section, because
@@ -1321,6 +1339,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1391,460 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before children in the monitoring function output.
+ *
+ * Statistics per context for all the processes are shared via the same dynamic
+ * shared area. The statistics for contexts that exceed the pre-determined size
+ * limit, are captured as a cumulative total at the end of individual statistics.
+ *
+ * If get_summary is true, we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to be
+ * able to display a cumulative total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ /* dsa_area *area = NULL; */
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (sizeof(MemoryContextEntry) + (MEM_CONTEXT_MAX_LEVEL
+ * sizeof(Datum)) + (2 * MEMORY_CONTEXT_IDENT_DISPLAY_SIZE));
+
+ elog(LOG, "Maximum statistics %d", max_stats);
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_num_of_contexts(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the statistics. If number of contexts exceed a predefined limit(8MB), a
+ * cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process specific lock to protect writes to process specific
+ * memory. This way two processes publishing statistics do not block each
+ * other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextEntry));
+ }
+ else
+ {
+ /* Free any previous allocations */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_prev_dsa_pointer))
+ {
+ /*
+ * Free the name, ident and path pointers before freeing the
+ * memory that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].prev_total_stats,
+ memCtxState[idx].memstats_prev_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_prev_dsa_pointer);
+ memCtxState[idx].memstats_prev_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_prev_dsa_pointer = memCtxState[idx].memstats_dsa_pointer;
+ memCtxState[idx].prev_total_stats = memCtxState[idx].total_stats;
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+ }
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].num_individual_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write aggregate of the remaining
+ * statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].num_individual_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ memCtxState[idx].num_individual_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+/* dsa_detach(area); */
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, inorder to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa buffer */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_DISPLAY_SIZE];
+ char *name;
+ char *ident;
+ Datum *path_array;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_DISPLAY_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_DISPLAY_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_DISPLAY_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strncpy(name,
+ clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strncpy(ident,
+ clipped_ident, strlen(clipped_ident));
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ /* Allocate dsa memory for storing path information */
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(Datum));
+ path_array = (Datum *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_array[foreach_current_index(i)] = Int32GetDatum(i);
+
+ memctx_info[curr_id].type = AssignContextType(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d..274c33a934 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8480,6 +8480,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index a2b63495ee..3dc3dcfb6c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..477ab99338 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..610e83e714 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 128
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,70 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int num_individual_stats;
+ int total_stats;
+ int prev_total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ dsa_pointer memstats_prev_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Singe DSA area is created and used by all the processes,
+ * each having its specific allocations for sharing memory
+ * stats, tracked by per backend static shared memory state
+ * above.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState * memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *AssignContextType(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca2..124bd5b4e5 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer'
+ INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..0a4cc3bf4d 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ checkpointer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='checkpointer'
+ INTO checkpointer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(checkpointer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bce4214503..84bf5c1d88 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1632,8 +1632,10 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
Hi,
Please find attached the updated patches after some cleanup and test
fixes.
Thank you,
Rahila Syed
On Tue, Feb 18, 2025 at 6:35 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Show quoted text
Hi,
Thanks for updating the patch!
The below comments would be a bit too detailed at this stage, but I’d
like to share the points I noticed.Thanks for sharing the detailed comments. I have incorporated some of them
into the new version of the patch. I will include the rest when I refine
and
comment the code further.Meanwhile, I have fixed the following outstanding issues:
1. Currently one DSA is created per backend when the first request for
statistics is made and remains for the lifetime of the server.
I think I should add logic to periodically destroy DSAs, when memory
context statistics are not being *actively* queried from the backend,
as determined by the statistics timestamp.After an offline discussion with Andres and Tomas, I have fixed this to
use
only one DSA for all the publishing backends/processes. Each backend
allocates smaller chunks of memory within the DSA while publishing
statistics.
These chunks are tracked independently by each backend, ensuring that two
publishing backends/processes do not block each other despite using the
same
DSA. This approach eliminates the overhead of creating multiple DSAs,
one for each backend.I am not destroying the DSA area because it stores the previously
published
statistics for each process. This allows the system to display older
statistics
when the latest data cannot be retrieved within a reasonable time.
Only the most recently updated statistics are kept, while all earlier ones
are freed using dsa_free by each backend when they are no longer needed.
.2. The two issues reported by Fujii-san here: [1].
i. I have proposed a fix for the first issue here [2].
ii. I am able to reproduce the second issue. This happens when we try
to query statistics of a backend running infinite_recurse.sql. While I am
working on finding a root-cause, I think it happens due to some memory
being overwritten due to to stack-depth violation, as the issue is not
seen
when I reduce the max_stack_depth to 100kb.
}
}The second issue is also resolved by using smaller allocations within a
DSA.
Previously, it occurred because a few statically allocated strings were
placed
within a single large chunk of DSA allocation. I have changed this to use
dynamically allocated chunks with dsa_allocate0 within the same DSA.Please find attached updated and rebased patches.
Thank you,
Rahila Syed
Attachments:
v14-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v14-0002-Function-to-report-memory-context-statistics.patchDownload
From 77f76d86e61a9bcfc4abe8a61a4441ba0f1afd76 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 41 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 451 +++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 489 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 74 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 1103 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df32ee0bf5..b6d50a6949 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28414,6 +28414,47 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes three
+ arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics.
+
+ When <parameter>get_summary</parameter> is set to true, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
+
+ When <parameter>get_summary</parameter> is set to false, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ade2708b59..a227b5e89f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -779,6 +779,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index b94f9cdff2..33c3c2d9c6 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -661,6 +661,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 12ee815a62..cd1ecb6b93 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 59d213031b..d670954c4e 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ffbf043935..b1a5e86a85 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..5eee04d52a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,8 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625..e425b9eeb0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f2f75aa0f8..8ae890e320 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3499,6 +3499,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f07162..3674b5b7b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..cfbc6ee544 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,11 +17,18 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -29,16 +36,8 @@
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +142,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = AssignContextType(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +157,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+AssignContextType(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +313,390 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal to return the memory contexts because allowing any users to issue
+ * this request at an unbounded rate would cause lots of requests to be sent
+ * and which can lead to denial of service. Additional roles can be permitted
+ * with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - it the process has too many
+ * memory contexts, the memory contexts into that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable.
+ * Once condition variable comes out of sleep, check if the memory context
+ * information is available for read and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+ bool prev_stats = false;
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a postgresql process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+
+ goto end;
+ }
+
+ /*
+ * Wait for a postgresql process to publish stats, indicated by a valid
+ * dsa pointer set by the backend. A dsa pointer could be valid if
+ * statitics have previously been published by the backend. In which case,
+ * check if statistics are not older than curr_timestamp, if they are wait
+ * for newer statistics. Wait for max_tries * MEMSTATS_WAIT_TIMEOUT,
+ * following which display older statistics if available.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ goto end;
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid)));
+
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_prev_dsa_pointer))
+ {
+ prev_stats = true;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ goto end;
+ }
+ }
+ num_retries = num_retries + 1;
+ }
+
+ }
+ /* XXX. Check if this lock is required */
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+ /* Assert for dsa_handle to be valid */
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ /* We should land here only with a valid memstats_dsa_pointer */
+
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ if (prev_stats == true)
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_prev_dsa_pointer);
+ else
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (i = 0; i < memCtxState[procNumber].num_individual_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum_array;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+
+ path_length = memctx_info[i].path_length;
+
+ path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
+ path_array = construct_array_builtin(path_datum_array,
+ path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[10] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+ /* If there are more contexts, display a cumulative total of those */
+ if (memCtxState[procNumber].total_stats > i)
+ {
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace - memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
+
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+end:
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ memCtxState[i].memstats_prev_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory
+ * context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 946a3731fd..28bb6e2200 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -164,6 +170,7 @@ MemoryContext CacheMemoryContext = NULL;
MemoryContext MessageContext = NULL;
MemoryContext TopTransactionContext = NULL;
MemoryContext CurTransactionContext = NULL;
+static dsa_area *area = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
@@ -177,6 +184,17 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
+
/*
* You should not do memory allocations within a critical section, because
@@ -1321,6 +1339,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1391,460 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before children in the monitoring function output.
+ *
+ * Statistics per context for all the processes are shared via the same dynamic
+ * shared area. The statistics for contexts that exceed the pre-determined size
+ * limit, are captured as a cumulative total at the end of individual statistics.
+ *
+ * If get_summary is true, we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to be
+ * able to display a cumulative total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ /* dsa_area *area = NULL; */
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (sizeof(MemoryContextEntry) + (MEM_CONTEXT_MAX_LEVEL
+ * sizeof(Datum)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE));
+
+ elog(LOG, "Maximum statistics %d", max_stats);
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_num_of_contexts(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the statistics. If number of contexts exceed a predefined limit(8MB), a
+ * cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process specific lock to protect writes to process specific
+ * memory. This way two processes publishing statistics do not block each
+ * other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area, stats_count * sizeof(MemoryContextEntry));
+ }
+ else
+ {
+ /* Free any previous allocations */
+ if (DsaPointerIsValid(memCtxState[idx].memstats_prev_dsa_pointer))
+ {
+ /*
+ * Free the name, ident and path pointers before freeing the
+ * memory that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].prev_total_stats,
+ memCtxState[idx].memstats_prev_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_prev_dsa_pointer);
+ memCtxState[idx].memstats_prev_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_prev_dsa_pointer = memCtxState[idx].memstats_dsa_pointer;
+ memCtxState[idx].prev_total_stats = memCtxState[idx].total_stats;
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+ }
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ /* For summary mode, total_stats and in_memory_stats remain the same */
+ memCtxState[idx].num_individual_stats = ctx_id;
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write aggregate of the remaining
+ * statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ memCtxState[idx].num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].num_individual_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ memCtxState[idx].num_individual_stats;
+ }
+ memCtxState[idx].total_stats = context_id;
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+/* dsa_detach(area); */
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, inorder to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa memory */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ Datum *path_array;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strncpy(name,
+ clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strncpy(ident,
+ clipped_ident, strlen(clipped_ident));
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ /* Allocate dsa memory for storing path information */
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(Datum));
+ path_array = (Datum *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_array[foreach_current_index(i)] = Int32GetDatum(i);
+
+ memctx_info[curr_id].type = AssignContextType(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d..274c33a934 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8480,6 +8480,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index a2b63495ee..3dc3dcfb6c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..477ab99338 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..81df1d0163 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,70 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int num_individual_stats;
+ int total_stats;
+ int prev_total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ dsa_pointer memstats_prev_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Singe DSA area is created and used by all the processes,
+ * each having its specific allocations for sharing memory
+ * stats, tracked by per backend static shared memory state
+ * above.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState * memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *AssignContextType(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca2..dca20ae1a2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..4767351d4e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 98ab45adfa..e6926a4a18 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1633,8 +1633,10 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v14-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v14-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From 110ecf4f77324d82a66d0120c886c75209c6688c Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d035..946a3731fd 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
On 2/20/25 14:26, Rahila Syed wrote:
Hi,
Please find attached the updated patches after some cleanup and test
fixes.Thank you,
Rahila SyedOn Tue, Feb 18, 2025 at 6:35 PM Rahila Syed <rahilasyed90@gmail.com
<mailto:rahilasyed90@gmail.com>> wrote:Hi,
Thanks for updating the patch!
The below comments would be a bit too detailed at this stage,
but I’d
like to share the points I noticed.Thanks for sharing the detailed comments. I have incorporated some
of them
into the new version of the patch. I will include the rest when I
refine and
comment the code further.Meanwhile, I have fixed the following outstanding issues:
1. Currently one DSA is created per backend when the first
request for
statistics is made and remains for the lifetime of the server.
I think I should add logic to periodically destroy DSAs, when memory
context statistics are not being *actively* queried from the
backend,
as determined by the statistics timestamp.
After an offline discussion with Andres and Tomas, I have fixed this
to use
only one DSA for all the publishing backends/processes. Each backend
allocates smaller chunks of memory within the DSA while publishing
statistics.
These chunks are tracked independently by each backend, ensuring
that two
publishing backends/processes do not block each other despite using
the same
DSA. This approach eliminates the overhead of creating multiple DSAs,
one for each backend.I am not destroying the DSA area because it stores the previously
published
statistics for each process. This allows the system to display older
statistics
when the latest data cannot be retrieved within a reasonable time.
Only the most recently updated statistics are kept, while all
earlier ones
are freed using dsa_free by each backend when they are no longer needed.
.
I think something is not quite right, because if I try running a simple
pgbench script that does pg_get_process_memory_contexts() on PIDs of
random postgres process (just like in the past), I immediately get this:
pgbench: error: client 28 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
pgbench: error: client 10 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
pgbench: error: client 5 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
pgbench: error: client 8 script 0 aborted in command 0 query 0: ERROR:
can't attach the same segment more than once
...
Perhaps the backends need to synchronize creation of the DSA?
2. The two issues reported by Fujii-san here: [1].
i. I have proposed a fix for the first issue here [2].
ii. I am able to reproduce the second issue. This happens when
we try
to query statistics of a backend running infinite_recurse.sql.
While I am
working on finding a root-cause, I think it happens due to some
memory
being overwritten due to to stack-depth violation, as the issue
is not seen
when I reduce the max_stack_depth to 100kb.
}
}The second issue is also resolved by using smaller allocations
within a DSA.
Previously, it occurred because a few statically allocated strings
were placed
within a single large chunk of DSA allocation. I have changed this
to use
dynamically allocated chunks with dsa_allocate0 within the same DSA.
Sounds good. Do you have any measurements how much this reduced the size
of the entries written to the DSA? How many entries will fit into 1MB of
shared memory?
regards
--
Tomas Vondra
I think something is not quite right, because if I try running a simple
pgbench script that does pg_get_process_memory_contexts() on PIDs of
random postgres process (just like in the past), I immediately get this:Thank you for testing. This issue occurs when a process that previously
attached
to a DSA area for publishing its own context statistics tries to attach to
it again while
querying statistics from another backend. Previously, I was not detaching
at the end
of publishing the statistics. I have now changed it to detach from the area
after the
statistics are published. The fix is included in the updated patch.
Perhaps the backends need to synchronize creation of the DSA?
This has been implemented in the patch.
Sounds good. Do you have any measurements how much this reduced the size
of the entries written to the DSA? How many entries will fit into 1MB of
shared memory?
The size of the entries has approximately halved after dynamically
allocating the
strings and a datum array.
Also, previously, I was allocating the entire memory for all contexts in
one large chunk
from DSA. I have now separated them into smaller allocations
per context. The integer counters are still allocated at once for all
contexts, but
the size of an allocated chunk will not exceed approximately 128 bytes *
total_num_of_contexts.
Average total number of contexts is in the hundreds.
PFA the updated and rebased patches.
Thank you,
Rahila Syed
Attachments:
v15-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v15-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From 3e8d8edb138391ddb183540a0f43f7a48456ac7a Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d035..946a3731fd 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
v15-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v15-0002-Function-to-report-memory-context-statistics.patchDownload
From 31df99c69e7ae60707b32c30bd6ad986a30ea36a Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 41 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 428 +++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 484 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 71 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 1072 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9f60a476eb..9bf019c2e5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28419,6 +28419,47 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes three
+ arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics.
+
+ When <parameter>get_summary</parameter> is set to true, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
+
+ When <parameter>get_summary</parameter> is set to false, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ddb303f520..f5e8050a75 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -779,6 +779,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 7acbbd3e26..f0f743ce7e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -661,6 +661,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index e6cd78679c..bca7675ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 88eab3d0ba..3be62084fd 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index f4d61c1f3b..aebf3f96f5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..5eee04d52a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,8 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7401b6e625..e425b9eeb0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -688,6 +688,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f2f75aa0f8..8ae890e320 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3499,6 +3499,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f07162..3674b5b7b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..f30647b4be 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,11 +17,18 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -29,16 +36,8 @@
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +142,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = AssignContextType(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +157,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+AssignContextType(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +313,367 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal to return the memory contexts because allowing any users to issue
+ * this request at an unbounded rate would cause lots of requests to be sent
+ * and which can lead to denial of service. Additional roles can be permitted
+ * with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - it the process has too many
+ * memory contexts, the memory contexts into that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable.
+ * Once condition variable comes out of sleep, check if the memory context
+ * information is available for read and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ int i;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error")));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ (errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead")));
+ PG_RETURN_NULL();
+ }
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a postgresql process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for a postgresql process to publish stats, indicated by a valid
+ * dsa pointer set by the backend. A dsa pointer could be valid if
+ * statitics have previously been published by the backend. In which case,
+ * check if statistics are not older than curr_timestamp, if they are wait
+ * for newer statistics. Wait for max_tries * MEMSTATS_WAIT_TIMEOUT,
+ * following which display older statistics if available.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ ereport(LOG,
+ (errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid)));
+
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ num_retries = num_retries + 1;
+ }
+
+ }
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+ /* Assert for dsa_handle to be valid */
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ /* We should land here only with a valid memstats_dsa_pointer */
+
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum_array;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
+ path_array = construct_array_builtin(path_datum_array,
+ path_length, INT4OID);
+
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[10] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory
+ * context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 946a3731fd..44fbd40af6 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -177,6 +183,17 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
+
/*
* You should not do memory allocations within a critical section, because
@@ -1321,6 +1338,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1390,456 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before children in the monitoring function output.
+ *
+ * Statistics per context for all the processes are shared via the same dynamic
+ * shared area. The statistics for contexts that exceed the pre-determined size
+ * limit, are captured as a cumulative total at the end of individual statistics.
+ *
+ * If get_summary is true, we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to be
+ * able to display a cumulative total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (sizeof(MemoryContextEntry) + (MEM_CONTEXT_MAX_LEVEL
+ * sizeof(Datum)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE));
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_num_of_contexts(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the statistics. If number of contexts exceed a predefined limit(8MB), a
+ * cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process specific lock to protect writes to process specific
+ * memory. This way two processes publishing statistics do not block each
+ * other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write aggregate of the remaining
+ * statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, inorder to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa memory */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ Datum *path_array;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strncpy(name,
+ clipped_ident, strlen(clipped_ident));
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strncpy(ident,
+ clipped_ident, strlen(clipped_ident));
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ /* Allocate dsa memory for storing path information */
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(Datum));
+ path_array = (Datum *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_array[foreach_current_index(i)] = Int32GetDatum(i);
+
+ memctx_info[curr_id].type = AssignContextType(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index af9546de23..cd71b96edd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8480,6 +8480,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index a2b63495ee..3dc3dcfb6c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..477ab99338 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..8e223a152c 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_TYPE_STRING_LENGTH 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,67 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Singe DSA area is created and used by all the processes,
+ * each having its specific allocations for sharing memory
+ * stats, tracked by per backend static shared memory state
+ * above.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState * memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *AssignContextType(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca2..dca20ae1a2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..4767351d4e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3e09a2207..432e7fa161 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1636,8 +1636,10 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 24 Feb 2025, at 13:46, Rahila Syed <rahilasyed90@gmail.com> wrote:
PFA the updated and rebased patches.
Thanks for the rebase, a few mostly superficial comments from a first
read-through. I'll do some more testing and playing around with it for
functional comments.
+ ...
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
The documentation refers to attributes in the return row but the format of that
row isn't displayed which makes following along hard. I think we should
include a table or a programlisting showing the return data before this
paragraph.
+const char *
+AssignContextType(NodeTag type)
This function doesn't actually assign anything so the name is a bit misleading,
it would be better with ContextTypeToString or something similar.
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
This sentence is really long and should probably be broken up.
+ * The shared memory buffer has a limited size - it the process has too many
s/it/if/
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
This comment should mention what happens if the process gives up and there is
no previously published stats.
+ int i;
...
+ for (i = 0; i < memCtxState[procNumber].total_stats; i++)
This can be rewritten as "for (int i = 0; .." since we allow C99.
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
This comment is copy/pasted from pg_log_backend_memory_contexts and while it
mostly still apply it should at least be rewritten to not refer to logging as
this function doesn't do that.
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
No need to add the extra parenthesis around errmsg anymore, so I think new code
should omit those.
+ errhint("Use pg_backend_memory_contexts view instead")));
Super nitpick, but errhints should be complete sentences ending with a period.
+ * statitics have previously been published by the backend. In which case,
s/statitics/statistics/
+ * statitics have previously been published by the backend. In which case,
+ * check if statistics are not older than curr_timestamp, if they are wait
I think the sentence around the time check is needlessly confusing, could it be
rewritten into something like:
"A valid DSA pointer isn't proof that statistics are available, it can be
valid due to previously published stats. Check if the stats are updated by
comparing the timestamp, if the stats are newer than our previously
recorded timestamp from before sending the procsignal they must by
definition be updated."
+ /* Assert for dsa_handle to be valid */
Was this intended to be turned into an Assert call? Else it seems better to remove.
+ if (print_location != PRINT_STATS_NONE)
This warrants a comment stating why it makes sense.
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
s/print_to_stderr/print_location/. Also, do we really need print_to_stderr in
this function at all? It seems a bit awkward to combine a boolean and a
paramter for a tri-state value when the parameter holds the tri_state already.
For readability I think just checking print_location will be better since the
value will be clear, where print_to_stderr=false is less clear in a tri-state
scenario.
+ * its ancestors to a list, inorder to compute a path.
s/inorder/in order/
+ elog(LOG, "hash table corrupted, can't construct path value");
+ break;
This will return either a NIL list or a partial path, but PublishMemoryContext
doesn't really take into consideration that it might be so. Is this really
benign to the point that we can blindly go on? Also, elog(LOG..) is mostly for
tracing or debugging as elog() isn't intended for user facing errors.
+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
This function does a lot than compute the number of contexts so the name seems
a bit misleading. Perhaps a rename to compute_contexts() or something similar?
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
These calls can use idlen instead of more strlen() calls no? While there is no
performance benefit, it would increase readability IMHO if the code first
calculates a value, and then use it.
+ strncpy(name,
+ clipped_ident, strlen(clipped_ident));
Since clipped_ident has been nul terminated earlier there is no need to use
strncpy, we can instead use strlcpy and take the target buffer size into
consideration rather than the input string length.
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
This comment should be different from the LOG_MEMORY_xx one.
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MAX_TYPE_STRING_LENGTH 64
These are unused, from an earlier version of the patch perhaps?
+ * Singe DSA area is created and used by all the processes,
s/Singe/Since/
+typedef struct MemoryContextBackendState
This is only used in mcxtfuncs.c and can be moved there rather than being
exported in the header.
+} MemoryContextId;
This lacks an entry in the typedefs.list file.
--
Daniel Gustafsson
Hi Daniel,
Thanks for the rebase, a few mostly superficial comments from a first
read-through.
Thank you for your comments.
The documentation refers to attributes in the return row but the format of
that
row isn't displayed which makes following along hard. I think we should
include a table or a programlisting showing the return data before this
paragraph.I included the sql function call and its output in programlisting format
after the
function description.
Since the description was part of a table, I added this additional
information at the
end of that table.
+const char *
+AssignContextType(NodeTag type)
This function doesn't actually assign anything so the name is a bit
misleading,
it would be better with ContextTypeToString or something similar.Done.
+ * By default, only superusers or users with PG_READ_ALL_STATS are
allowed to
This sentence is really long and should probably be broken up.Fixed.
+ * The shared memory buffer has a limited size - it the process has too
many
s/it/if/Fixed.
+ * If the publishing backend does not respond before the condition variable + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries + * number of times, which is defined by user, before giving up and + * returning previously published statistics, if any. This comment should mention what happens if the process gives up and there is no previously published stats.Done.
+ int i; ... + for (i = 0; i < memCtxState[procNumber].total_stats; i++) This can be rewritten as "for (int i = 0; .." since we allow C99.Done.
+ * process running and consuming lots of memory, that it might end on its + * own first and its memory contexts are not logged is not a problem. This comment is copy/pasted from pg_log_backend_memory_contexts and while it mostly still apply it should at least be rewritten to not refer to logging as this function doesn't do that.Fixed.
+ ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", No need to add the extra parenthesis around errmsg anymore, so I think new code should omit those.Done.
+ errhint("Use pg_backend_memory_contexts view instead")));
Super nitpick, but errhints should be complete sentences ending with a
period.Done.
+ * statitics have previously been published by the backend. In which
case,
s/statitics/statistics/Fixed.
+ * statitics have previously been published by the backend. In which case, + * check if statistics are not older than curr_timestamp, if they are wait I think the sentence around the time check is needlessly confusing, could it be rewritten into something like: "A valid DSA pointer isn't proof that statistics are available, it can be valid due to previously published stats. Check if the stats are updated by comparing the timestamp, if the stats are newer than our previously recorded timestamp from before sending the procsignal they must by definition be updated."Replaced accordingly.
+ /* Assert for dsa_handle to be valid */
Was this intended to be turned into an Assert call? Else it seems better
to remove.
Added an assert and removed the comment.
+ if (print_location != PRINT_STATS_NONE)
This warrants a comment stating why it makes sense.
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
s/print_to_stderr/print_location/. Also, do we really need
print_to_stderr in
this function at all? It seems a bit awkward to combine a boolean and a
paramter for a tri-state value when the parameter holds the tri_state
already.
For readability I think just checking print_location will be better since
the
value will be clear, where print_to_stderr=false is less clear in a
tri-state
scenario.I removed the boolean print_to_stderr, the checks are now using
the tri-state enum-print_location.
+ * its ancestors to a list, inorder to compute a path.
s/inorder/in order/Fixed.
+ elog(LOG, "hash table corrupted, can't construct path value"); + break; This will return either a NIL list or a partial path, but PublishMemoryContext doesn't really take into consideration that it might be so. Is this really benign to the point that we can blindly go on? Also, elog(LOG..) is mostly for tracing or debugging as elog() isn't intended for user facing errors.I agree that this should be addressed. I added a check for path value
before
storing it in shared memory. If the path is NIL, the path pointer in DSA
will point
to InvalidDsaPointer.
When a client encounters an InvalidDsaPointer it will print NULL in the
path column.
Partial path scenario is unlikely IMO, and I am not sure if it warrants
additional
checks.
+static void +compute_num_of_contexts(List *contexts, HTAB *context_id_lookup, + int *stats_count, bool get_summary) This function does a lot than compute the number of contexts so the name seems a bit misleading. Perhaps a rename to compute_contexts() or something similar?Renamed to compute_contexts_count_and_ids.
+ memctx_info[curr_id].name = dsa_allocate0(area, + strlen(clipped_ident) + 1); These calls can use idlen instead of more strlen() calls no? While there is no performance benefit, it would increase readability IMHO if the code first calculates a value, and then use it.Done.
+ strncpy(name, + clipped_ident, strlen(clipped_ident)); Since clipped_ident has been nul terminated earlier there is no need to use strncpy, we can instead use strlcpy and take the target buffer size into consideration rather than the input string length.Replaced with the strlcpy calls.
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts
*/
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts
*/
This comment should be different from the LOG_MEMORY_xx one.Fixed.
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MAX_TYPE_STRING_LENGTH 64
These are unused, from an earlier version of the patch perhaps?Removed
+ * Singe DSA area is created and used by all the processes,
s/Singe/Since/
Fixed.
+typedef struct MemoryContextBackendState
This is only used in mcxtfuncs.c and can be moved there rather than being
exported in the header.
This is being used in mcxt.c too in the form of the variable memCtxState.
+} MemoryContextId;
This lacks an entry in the typedefs.list file.
Added.
Please find attached the updated patches with the above fixes.
Thank you,
Rahila Syed
Attachments:
v16-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v16-0002-Function-to-report-memory-context-statistics.patchDownload
From 89732aecfad3ba9f22f92cb15bc741e5dc218f56 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 61 +++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 435 +++++++++++++--
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 517 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 69 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 4 +
21 files changed, 1117 insertions(+), 44 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bf31b1f3ee..08e5ccd3eb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28529,6 +28529,50 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes three
+ arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context statistics
+ are aggregated and a cumulative total is displayed. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics. The num_agg_contexts value is typically 1, meaning that each
+ context's statistics are displayed separately.
+
+ When <parameter>get_summary</parameter> is set to true, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
+
+ When <parameter>get_summary</parameter> is set to false, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28647,6 +28691,23 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer')
+ , false, 5) LIMIT 1;
+ name | ident | type | path | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes | num_
+agg_contexts | stats_timestamp
+------------------+-------+----------+------+-------------+---------------+------------+-------------+------------+-----
+-------------+----------------------------------
+ TopMemoryContext | | AllocSet | {1} | 102664 | 4 | 3008 | 2 | 99656 |
+ 1 | 2025-03-04 10:01:57.590543+05:30
+</programlisting>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dfb8d068ec..f9d86de334 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -779,6 +779,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 7acbbd3e26..f0f743ce7e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -661,6 +661,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index be69e4c713..9481a5cd24 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index e6cd78679c..bca7675ccd 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 88eab3d0ba..3be62084fd 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index f4d61c1f3b..aebf3f96f5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..5eee04d52a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,8 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7d20196550..b59034fdc3 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f2f75aa0f8..8ae890e320 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3499,6 +3499,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index e199f07162..3674b5b7b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..943399c937 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -281,7 +288,7 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* to acquire a lock on an arbitrary process to prevent that. But since
* this mechanism is usually used to debug a backend or an auxiliary
* process running and consuming lots of memory, that it might end on its
- * own first and its memory contexts are not logged is not a problem.
+ * own first and its memory contexts are not reported is not a problem.
*/
if (proc == NULL)
{
@@ -290,7 +297,7 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* if one backend terminated on its own during the run.
*/
ereport(WARNING,
- (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ errmsg("PID %d is not a PostgreSQL server process", pid));
PG_RETURN_BOOL(false);
}
@@ -299,9 +306,373 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
{
/* Again, just a warning to allow loops */
ereport(WARNING,
- (errmsg("could not send signal to process %d: %m", pid)));
+ errmsg("could not send signal to process %d: %m", pid));
PG_RETURN_BOOL(false);
}
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - if the process has too many
+ * memory contexts, the memory contexts that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing
+ * backend.
+ * Once condition variable is signalled, check if the memory context
+ * information is available for reading and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any. If previous statistics
+ * do not exist, return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error"));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead."));
+ PG_RETURN_NULL();
+ }
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a postgresql process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for max_tries *
+ * MEMSTATS_WAIT_TIMEOUT, following which display old statistics if
+ * available or return NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ ereport(LOG,
+ errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid));
+ num_retries = num_retries + 1;
+ }
+
+ }
+ /* We should land here only with a valid DSA handle */
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum_array;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
+ path_array = construct_array_builtin(path_datum_array,
+ path_length, INT4OID);
+
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+ values[4] = Int64GetDatum(memctx_info[i].totalspace);
+ values[5] = Int64GetDatum(memctx_info[i].nblocks);
+ values[6] = Int64GetDatum(memctx_info[i].freespace);
+ values[7] = Int64GetDatum(memctx_info[i].freechunks);
+ values[8] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[9] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[10] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory
+ * context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 34cdcdf2fd..9572c49e48 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,16 +19,22 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
-
static void BogusFree(void *pointer);
static void *BogusRealloc(void *pointer, Size size, int flags);
static MemoryContext BogusGetChunkContext(void *pointer);
@@ -177,6 +183,17 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
+
/*
* You should not do memory allocations within a critical section, because
@@ -889,7 +906,8 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
@@ -899,36 +917,41 @@ MemoryContextStatsInternal(MemoryContext context, int level,
{
MemoryContext child;
int ichild;
- bool print_to_stderr = true;
check_stack_depth();
Assert(MemoryContextIsValid(context));
if (print_location == PRINT_STATS_TO_STDERR)
- print_to_stderr = true;
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ }
else if (print_location == PRINT_STATS_TO_LOGS)
- print_to_stderr = false;
-
- if (print_location != PRINT_STATS_NONE)
{
+
/* Examine the context itself */
context->methods->stats(context,
MemoryContextStatsPrint,
&level,
- totals, print_to_stderr);
+ totals, false);
}
/*
* Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
- * only compute totals.
+ * only compute totals. This is used in reporting of memory context
+ * statistics via a sql function. Last parameter is not relevant.
*/
else
{
+ Assert(print_location == PRINT_STATS_NONE);
/* Examine the context itself */
context->methods->stats(context,
NULL,
NULL,
- totals, print_to_stderr);
+ totals, false);
}
/* Increment the context count */
*num_contexts = *num_contexts + 1;
@@ -971,7 +994,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
*num_contexts = *num_contexts + ichild;
- if (print_to_stderr)
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -984,7 +1007,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else if (print_location != PRINT_STATS_NONE)
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1321,6 +1344,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1396,461 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before children in the monitoring function output.
+ *
+ * Statistics per context for all the processes are shared via the same dynamic
+ * shared area. The statistics for contexts that exceed the pre-determined size
+ * limit, are captured as a cumulative total at the end of individual statistics.
+ *
+ * If get_summary is true, we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to be
+ * able to display a cumulative total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (sizeof(MemoryContextEntry) + (MEM_CONTEXT_MAX_LEVEL
+ * sizeof(Datum)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE));
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process specific lock to protect writes to process specific
+ * memory. This way two processes publishing statistics do not block each
+ * other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write aggregate of the remaining
+ * statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ ereport(LOG,
+ errmsg("hash table corrupted, can't construct path value"));
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa memory */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ Datum *path_array;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strlcpy(name,
+ clipped_ident, idlen + 1);
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strlcpy(ident,
+ clipped_ident, idlen + 1);
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ /* Allocate dsa memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(Datum));
+ path_array = (Datum *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_array[foreach_current_index(i)] = Int32GetDatum(i);
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0ba..c5acbfeb80 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8480,6 +8480,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index a2b63495ee..3dc3dcfb6c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..5d4b2fbfc9 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..a8d1956a82 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,10 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +325,67 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Single DSA area is created and used by all the processes,
+ * each having its specific dsa allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca2..dca20ae1a2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..4767351d4e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 19ff271ba5..63b9dde1b9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1639,12 +1639,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v16-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v16-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From acdc4aadeaade92471dd9c6ff3b178e394a16a4a Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 91060de0ab..34cdcdf2fd 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
Hi,
Please find attached updated and rebased patches. It has the following
changes
1. To prevent memory leaks, ensure that the latest statistics published by
a process
are freed before it exits. This can be achieved by calling dsa_free in the
before_shmem_exit callback.
2. Add a level column to maintain consistency with the output of
pg_backend_memory_contexts.
Thank you,
Rahila Syed
On Tue, Mar 4, 2025 at 12:30 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Show quoted text
Hi Daniel,
Thanks for the rebase, a few mostly superficial comments from a first
read-through.
Thank you for your comments.
The documentation refers to attributes in the return row but the format
of that
row isn't displayed which makes following along hard. I think we should
include a table or a programlisting showing the return data before this
paragraph.I included the sql function call and its output in programlisting format
after the
function description.
Since the description was part of a table, I added this additional
information at the
end of that table.+const char *
+AssignContextType(NodeTag type)
This function doesn't actually assign anything so the name is a bit
misleading,
it would be better with ContextTypeToString or something similar.Done.
+ * By default, only superusers or users with PG_READ_ALL_STATS are
allowed to
This sentence is really long and should probably be broken up.Fixed.
+ * The shared memory buffer has a limited size - it the process has too
many
s/it/if/Fixed.
+ * If the publishing backend does not respond before the condition variable + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries + * number of times, which is defined by user, before giving up and + * returning previously published statistics, if any. This comment should mention what happens if the process gives up and there is no previously published stats.Done.
+ int i; ... + for (i = 0; i < memCtxState[procNumber].total_stats; i++) This can be rewritten as "for (int i = 0; .." since we allow C99.Done.
+ * process running and consuming lots of memory, that it might end on its + * own first and its memory contexts are not logged is not a problem. This comment is copy/pasted from pg_log_backend_memory_contexts and while it mostly still apply it should at least be rewritten to not refer to logging as this function doesn't do that.Fixed.
+ ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", No need to add the extra parenthesis around errmsg anymore, so I think new code should omit those.Done.
+ errhint("Use pg_backend_memory_contexts view instead")));
Super nitpick, but errhints should be complete sentences ending with a
period.Done.
+ * statitics have previously been published by the backend. In which
case,
s/statitics/statistics/Fixed.
+ * statitics have previously been published by the backend. In which case, + * check if statistics are not older than curr_timestamp, if they are wait I think the sentence around the time check is needlessly confusing, could it be rewritten into something like: "A valid DSA pointer isn't proof that statistics are available, it can be valid due to previously published stats. Check if the stats are updated by comparing the timestamp, if the stats are newer than our previously recorded timestamp from before sending the procsignal they must by definition be updated."Replaced accordingly.
+ /* Assert for dsa_handle to be valid */
Was this intended to be turned into an Assert call? Else it seems better
to remove.Added an assert and removed the comment.
+ if (print_location != PRINT_STATS_NONE)
This warrants a comment stating why it makes sense.+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
s/print_to_stderr/print_location/. Also, do we really need
print_to_stderr in
this function at all? It seems a bit awkward to combine a boolean and a
paramter for a tri-state value when the parameter holds the tri_state
already.
For readability I think just checking print_location will be better since
the
value will be clear, where print_to_stderr=false is less clear in a
tri-state
scenario.I removed the boolean print_to_stderr, the checks are now using
the tri-state enum-print_location.
+ * its ancestors to a list, inorder to compute a path.
s/inorder/in order/Fixed.
+ elog(LOG, "hash table corrupted, can't construct path value"); + break; This will return either a NIL list or a partial path, but PublishMemoryContext doesn't really take into consideration that it might be so. Is this really benign to the point that we can blindly go on? Also, elog(LOG..) is mostly for tracing or debugging as elog() isn't intended for user facing errors.I agree that this should be addressed. I added a check for path value
before
storing it in shared memory. If the path is NIL, the path pointer in DSA
will point
to InvalidDsaPointer.
When a client encounters an InvalidDsaPointer it will print NULL in the
path column.
Partial path scenario is unlikely IMO, and I am not sure if it warrants
additional
checks.+static void +compute_num_of_contexts(List *contexts, HTAB *context_id_lookup, + int *stats_count, bool get_summary) This function does a lot than compute the number of contexts so the name seems a bit misleading. Perhaps a rename to compute_contexts() or something similar?Renamed to compute_contexts_count_and_ids.
+ memctx_info[curr_id].name = dsa_allocate0(area, + strlen(clipped_ident) + 1); These calls can use idlen instead of more strlen() calls no? While there is no performance benefit, it would increase readability IMHO if the code first calculates a value, and then use it.Done.
+ strncpy(name, + clipped_ident, strlen(clipped_ident)); Since clipped_ident has been nul terminated earlier there is no need to use strncpy, we can instead use strlcpy and take the target buffer size into consideration rather than the input string length.Replaced with the strlcpy calls.
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts
*/
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to log the memory contexts
*/
This comment should be different from the LOG_MEMORY_xx one.Fixed.
+#define MEM_CONTEXT_SHMEM_STATS_SIZE 30
+#define MAX_TYPE_STRING_LENGTH 64
These are unused, from an earlier version of the patch perhaps?Removed
+ * Singe DSA area is created and used by all the processes,
s/Singe/Since/
Fixed.
+typedef struct MemoryContextBackendState
This is only used in mcxtfuncs.c and can be moved there rather than being
exported in the header.This is being used in mcxt.c too in the form of the variable memCtxState.
+} MemoryContextId;
This lacks an entry in the typedefs.list file.
Added.
Please find attached the updated patches with the above fixes.
Thank you,
Rahila Syed
Attachments:
v17-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v17-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From bce37aa953a3aefbe83d98bbb9fa931598bd347d Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 91060de0ab..34cdcdf2fd 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
v17-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v17-0002-Function-to-report-memory-context-statistics.patchDownload
From 3a570e83771aa1ce9399f2717f16c704cb7a00c0 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 61 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 436 +++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 567 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 69 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 4 +
21 files changed, 1169 insertions(+), 43 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 51dd8ad657..b5eee4d4a7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28553,6 +28553,50 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes three
+ arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context statistics
+ are aggregated and a cumulative total is displayed. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics. The num_agg_contexts value is typically 1, meaning that each
+ context's statistics are displayed separately.
+
+ When <parameter>get_summary</parameter> is set to true, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
+
+ When <parameter>get_summary</parameter> is set to false, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28671,6 +28715,23 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer')
+ , false, 5) LIMIT 1;
+ name | ident | type | path | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes | num_
+agg_contexts | stats_timestamp
+------------------+-------+----------+------+-------------+---------------+------------+-------------+------------+-----
+-------------+----------------------------------
+ TopMemoryContext | | AllocSet | {1} | 102664 | 4 | 3008 | 2 | 99656 |
+ 1 | 2025-03-04 10:01:57.590543+05:30
+</programlisting>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 800815dfbc..6630fbf05c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -779,6 +779,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 0e228d143a..dc1b228b71 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -661,6 +661,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906e..f24f574e74 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index dbe4e1d426..9cdba52054 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393..7149a67fcb 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index ccba0f84e6..4da70f6e0a 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -877,6 +877,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 174eed7036..5eee04d52a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -340,6 +341,8 @@ CreateOrAttachShmemStructs(void)
StatsShmemInit();
WaitEventCustomShmemInit();
InjectionPointShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7d20196550..b59034fdc3 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 55ab2da299..00bc19c3a7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3500,6 +3500,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 3c594415bf..fcf68b47b9 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -159,6 +159,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..96a7c634e7 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -281,7 +288,7 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* to acquire a lock on an arbitrary process to prevent that. But since
* this mechanism is usually used to debug a backend or an auxiliary
* process running and consuming lots of memory, that it might end on its
- * own first and its memory contexts are not logged is not a problem.
+ * own first and its memory contexts are not reported is not a problem.
*/
if (proc == NULL)
{
@@ -290,7 +297,7 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* if one backend terminated on its own during the run.
*/
ereport(WARNING,
- (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ errmsg("PID %d is not a PostgreSQL server process", pid));
PG_RETURN_BOOL(false);
}
@@ -299,9 +306,374 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
{
/* Again, just a warning to allow loops */
ereport(WARNING,
- (errmsg("could not send signal to process %d: %m", pid)));
+ errmsg("could not send signal to process %d: %m", pid));
PG_RETURN_BOOL(false);
}
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * The shared memory buffer has a limited size - if the process has too many
+ * memory contexts, the memory contexts that do not fit are summarized
+ * and represented as cumulative total at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing
+ * backend.
+ * Once condition variable is signalled, check if the memory context
+ * information is available for reading and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any. If previous statistics
+ * do not exist, return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error"));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead."));
+ PG_RETURN_NULL();
+ }
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a postgresql process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for max_tries *
+ * MEMSTATS_WAIT_TIMEOUT, following which display old statistics if
+ * available or return NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ break;
+ }
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ ereport(LOG,
+ errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid));
+ num_retries = num_retries + 1;
+ }
+
+ }
+ /* We should land here only with a valid DSA handle */
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum_array;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
+ path_array = construct_array_builtin(path_datum_array,
+ path_length, INT4OID);
+
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+ values[4] = Int32GetDatum(path_length); /* level */
+ values[5] = Int64GetDatum(memctx_info[i].totalspace);
+ values[6] = Int64GetDatum(memctx_info[i].nblocks);
+ values[7] = Int64GetDatum(memctx_info[i].freespace);
+ values[8] = Int64GetDatum(memctx_info[i].freechunks);
+ values[9] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[10] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory
+ * context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 34cdcdf2fd..d1ad444b33 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,10 +19,18 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -164,6 +172,7 @@ MemoryContext CacheMemoryContext = NULL;
MemoryContext MessageContext = NULL;
MemoryContext TopTransactionContext = NULL;
MemoryContext CurTransactionContext = NULL;
+static bool DsaCleanupRegistered = false;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
@@ -177,6 +186,17 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
+static void AtProcExit_memctx_dsa_free(int code, Datum arg);
/*
* You should not do memory allocations within a critical section, because
@@ -889,7 +909,8 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
@@ -899,36 +920,41 @@ MemoryContextStatsInternal(MemoryContext context, int level,
{
MemoryContext child;
int ichild;
- bool print_to_stderr = true;
check_stack_depth();
Assert(MemoryContextIsValid(context));
if (print_location == PRINT_STATS_TO_STDERR)
- print_to_stderr = true;
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ }
else if (print_location == PRINT_STATS_TO_LOGS)
- print_to_stderr = false;
-
- if (print_location != PRINT_STATS_NONE)
{
+
/* Examine the context itself */
context->methods->stats(context,
MemoryContextStatsPrint,
&level,
- totals, print_to_stderr);
+ totals, false);
}
/*
* Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
- * only compute totals.
+ * only compute totals. This is used in reporting of memory context
+ * statistics via a sql function. Last parameter is not relevant.
*/
else
{
+ Assert(print_location == PRINT_STATS_NONE);
/* Examine the context itself */
context->methods->stats(context,
NULL,
NULL,
- totals, print_to_stderr);
+ totals, false);
}
/* Increment the context count */
*num_contexts = *num_contexts + 1;
@@ -971,7 +997,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
*num_contexts = *num_contexts + ichild;
- if (print_to_stderr)
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -984,7 +1010,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else if (print_location != PRINT_STATS_NONE)
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1321,6 +1347,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1399,510 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before children in the monitoring function output.
+ *
+ * Statistics per context for all the processes are shared via the same dynamic
+ * shared area. The statistics for contexts that exceed the pre-determined size
+ * limit, are captured as a cumulative total at the end of individual statistics.
+ *
+ * If get_summary is true, we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to be
+ * able to display a cumulative total of memory consumption by a parent.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table used for constructing "path" column of the view, similar
+ * to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_NUM_DEFAULT_SEGMENTS * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (sizeof(MemoryContextEntry) + (MEM_CONTEXT_MAX_LEVEL
+ * sizeof(Datum)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE));
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Register a callback to free the allocated dsa pointers on process exit
+ * This is done after we have a valid dsa area.
+ */
+ if (!DsaCleanupRegistered)
+ {
+ before_shmem_exit(AtProcExit_memctx_dsa_free, PointerGetDatum(area));
+ DsaCleanupRegistered = true;
+ }
+
+ /*
+ * Hold the process specific lock to protect writes to process specific
+ * memory. This way two processes publishing statistics do not block each
+ * other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children(XXX. Make it
+ * capped at 100). This includes statistics of all of their children
+ * upto level 100.
+ */
+
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit is reached, write aggregate of the remaining
+ * statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after setting the exit condition
+ * flag
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ ereport(LOG,
+ errmsg("hash table corrupted, can't construct path value"));
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa memory */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ Datum *path_array;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strlcpy(name,
+ clipped_ident, idlen + 1);
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strlcpy(ident,
+ clipped_ident, idlen + 1);
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ /* Allocate dsa memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(Datum));
+ path_array = (Datum *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_array[foreach_current_index(i)] = Int32GetDatum(i);
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in dsa area.
+ */
+static void
+AtProcExit_memctx_dsa_free(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ dsm_segment *dsm_seg = NULL;
+ dsa_area *area;
+
+ if (memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID)
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+ else
+ return;
+
+ if (dsm_seg == NULL)
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ else
+ area = (dsa_area *) DatumGetPointer(arg);
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers
+ * and integer statistics.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ LWLockRelease(&memCtxState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 42e427f8fe..a999206b07 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8506,6 +8506,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 6f16794eb6..63f74e3ce1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..5d4b2fbfc9 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..a8d1956a82 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,10 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_NUM_DEFAULT_SEGMENTS 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +325,67 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Single DSA area is created and used by all the processes,
+ * each having its specific dsa allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca2..dca20ae1a2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..4767351d4e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 93339ef3c5..11e6dad561 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1643,12 +1643,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
Hi, Rahila!
On Thu, Mar 13, 2025 at 3:57 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Please find attached updated and rebased patches. It has the following changes
1. To prevent memory leaks, ensure that the latest statistics published by a process
are freed before it exits. This can be achieved by calling dsa_free in the
before_shmem_exit callback.
2. Add a level column to maintain consistency with the output of
pg_backend_memory_contexts.
Thank you for your work on this subject.
v17-0001-Preparatory-changes-for-reporting-memory-context-sta.patch
It looks like we're increasing *num_contexts twice per child memory
context. First, it gets increased with a recursive
MemoryContextStatsInternal() call, then by adding an ichild. I might
be wrong, but I think these calculations at least deserve more
comments.
v17-0002-Function-to-report-memory-context-statistics.patch
+ if (procNumber == MyProcNumber)
+ {
+ ereport(WARNING,
+ errmsg("cannot return statistics for local backend"),
+ errhint("Use pg_backend_memory_contexts view instead."));
+ PG_RETURN_NULL();
+ }
Is it worth it to keep this restriction? Can we fetch info about
local memory context for the sake of generality?
I know there have been discussions in the thread before, but the
mechanism of publishing memory context stats via DSA looks quite
complicated. Also, the user probably intends to inspect memory
contexts when there is not a huge amount of free memory. So, failure
is probable on DSA allocation. Could we do simpler? For instance,
allocate some amount of static shared memory and use it as a message
queue between processes. As a heavy load is not supposed to be here,
I think one queue would be enough.
------
Regards,
Alexander Korotkov
Supabase
Hi Alexander,
Thank you for the review.
It looks like we're increasing *num_contexts twice per child memory
context. First, it gets increased with a recursive
MemoryContextStatsInternal() call, then by adding an ichild. I might
be wrong, but I think these calculations at least deserve more
comments.
I believe that's not the case. The recursive calls only work for children
encountered up to max_level and less than max_children per context.
The rest of the children are handled using MemoryContextTraverseNext,
without recursive calls. Thus, num_contexts is incremented for those
children separately from the recursive call counter.
I will add more comments around this.
v17-0002-Function-to-report-memory-context-statistics.patch
+ if (procNumber == MyProcNumber) + { + ereport(WARNING, + errmsg("cannot return statistics for local backend"), + errhint("Use pg_backend_memory_contexts view instead.")); + PG_RETURN_NULL(); + }Is it worth it to keep this restriction? Can we fetch info about
local memory context for the sake of generality?
I think that could be done, but using pg_backend_memory_context would
be more efficient in this case.
I know there have been discussions in the thread before, but the
mechanism of publishing memory context stats via DSA looks quite
complicated. Also, the user probably intends to inspect memory
contexts when there is not a huge amount of free memory. So, failure
is probable on DSA allocation. Could we do simpler? For instance,
allocate some amount of static shared memory and use it as a message
queue between processes. As a heavy load is not supposed to be here,
I think one queue would be enough.
There could be other uses for such a function, such as a monitoring
dashboard
that periodically queries all running backends for memory statistics. If we
use a
single queue shared between all the backends, they will need to wait for
the queue
to become available before sharing their statistics, leading to processing
delays at
the publishing backend.
Even with separate queues for each backend or without expecting concurrent
use,
publishing statistics could be delayed if a message queue is full. This is
because a
backend needs to wait for a client process to consume existing messages or
statistics before publishing more.
If a client process exits without consuming messages, the publishing
backend will
experience timeouts when trying to publish stats. This will impact backend
performance
as statistics are published during CHECK_FOR_INTERRUPTS.
In the current implementation, the backend publishes all the statistics in
one go
without waiting for clients to read any statistics.
In addition, allocating complete message queues in static shared memory can
be
expensive, especially since these static structures need to be created even
if memory
context statistics are never queried.
On the contrary, a dsa is created for the feature whenever statistics are
first queried.
We are not preallocating shared memory for this feature, except for small
structures
to store the dsa_handle and dsa_pointers for each backend.
Thank you,
Rahila Syed
On Mon, Mar 17, 2025 at 1:23 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
v17-0002-Function-to-report-memory-context-statistics.patch
+ if (procNumber == MyProcNumber) + { + ereport(WARNING, + errmsg("cannot return statistics for local backend"), + errhint("Use pg_backend_memory_contexts view instead.")); + PG_RETURN_NULL(); + }Is it worth it to keep this restriction? Can we fetch info about
local memory context for the sake of generality?I think that could be done, but using pg_backend_memory_context would
be more efficient in this case.
I have raised a similar concern before. Having two separate functions
one for local backend and other for remote is going to be confusing.
We should have one function doing both and renamed appropriately.
--
Best Wishes,
Ashutosh Bapat
Hi,
+ if (procNumber == MyProcNumber) + { + ereport(WARNING, + errmsg("cannot return statistics for local backend"), + errhint("Use pg_backend_memory_contexts viewinstead."));
+ PG_RETURN_NULL();
+ }Is it worth it to keep this restriction? Can we fetch info about
local memory context for the sake of generality?I think that could be done, but using pg_backend_memory_context would
be more efficient in this case.I have raised a similar concern before. Having two separate functions
one for local backend and other for remote is going to be confusing.
We should have one function doing both and renamed appropriately.
This is a separate concern from what has been raised by Alexander.
He has suggested removing the restriction and fetching local backend
statistics also
with the proposed function.
I've removed this restriction in the latest version of the patch. Now, the
proposed
function can be used to fetch local backend statistics too.
Regarding your suggestion on merging these functions, although they both
report memory
context statistics, they differ in how they fetch these statistics—locally
versus from dynamic
shared memory. Additionally, the function signatures are different: the
proposed function
takes three arguments (pid, get_summary, and num_tries), whereas
pg_get_backend_memory_contexts does not take any arguments. Therefore, I
believe
these functions can remain separate as long as we document their usages
correctly.
Please find attached rebased and updated patches. I have added some more
comments and
fixed an issue caused due to registering before_shmem_exit callback from
interrupt processing
routine. To address this issue, I am registering this callback in the
InitProcess() function.
This happened because interrupt processing could be triggered from a
PG_ENSURE_ERROR_CLEANUP block. This block operates under the assumption
that
the before_shmem_exit callback registered at the beginning of the block,
will be the last one
in the registered callback list at the end of the block, which would not be
the case if I register
before_shmem_exit callback in the interrupt handling routine.
Thank you,
Rahila Syed
Attachments:
v18-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v18-0002-Function-to-report-memory-context-statistics.patchDownload
From edcddb2f2c5fa9fb9f10f71c90283cc4c19fadbf Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 61 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 15 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 415 ++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 568 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 70 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 4 +
22 files changed, 1167 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2ab5661602..5cb850de2e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28570,6 +28570,50 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes three
+ arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context statistics
+ are aggregated and a cumulative total is displayed. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics. The num_agg_contexts value is typically 1, meaning that each
+ context's statistics are displayed separately.
+
+ When <parameter>get_summary</parameter> is set to true, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
+
+ When <parameter>get_summary</parameter> is set to false, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28688,6 +28732,23 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer')
+ , false, 5) LIMIT 1;
+ name | ident | type | path | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes | num_
+agg_contexts | stats_timestamp
+------------------+-------+----------+------+-------------+---------------+------------+-------------+------------+-----
+-------------+----------------------------------
+ TopMemoryContext | | AllocSet | {1} | 102664 | 4 | 3008 | 2 | 99656 |
+ 1 | 2025-03-04 10:01:57.590543+05:30
+</programlisting>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a..16756152b7 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1c..d3cb3f1891 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906e..f24f574e74 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd..cb7408acf4 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393..7149a67fcb 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f87..c7a76711cc 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0..362a6dc952 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -343,6 +344,8 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7d20196550..b59034fdc3 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e4ca861a8e..c50dcbc491 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -497,6 +498,13 @@ InitProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the dsa memory occupied by the
+ * latest memory context statistics that could be published by this
+ * backend if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at backend exit.
*/
@@ -671,6 +679,13 @@ InitAuxiliaryProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the dsa memory occupied by the
+ * latest memory context statistics that could be published by this
+ * process if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at process exit.
*/
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0554a4ae3c..b9ff50a929 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3507,6 +3507,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9fa12a555e..e014e895bf 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -160,6 +160,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..462c4e48cf 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +312,353 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on dsa memory that could be allocated per process -
+ * if the process has more memory contexts than that can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing
+ * backend.
+ * Once condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any. If previous statistics
+ * do not exist, return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error"));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a postgresql process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for max_tries *
+ * MEMSTATS_WAIT_TIMEOUT, following which display old statistics if
+ * available or return NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ ereport(LOG,
+ errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid));
+ num_retries = num_retries + 1;
+ }
+
+ }
+ /* We should land here only with a valid DSA handle */
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum_array;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
+ path_array = construct_array_builtin(path_datum_array,
+ path_length, INT4OID);
+
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+ values[4] = Int32GetDatum(path_length); /* level */
+ values[5] = Int64GetDatum(memctx_info[i].totalspace);
+ values[6] = Int64GetDatum(memctx_info[i].nblocks);
+ values[7] = Int64GetDatum(memctx_info[i].freespace);
+ values[8] = Int64GetDatum(memctx_info[i].freechunks);
+ values[9] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[10] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory
+ * context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 34cdcdf2fd..214330aa7a 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,10 +19,18 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -177,6 +185,16 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
/*
* You should not do memory allocations within a critical section, because
@@ -889,7 +907,8 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
@@ -899,38 +918,43 @@ MemoryContextStatsInternal(MemoryContext context, int level,
{
MemoryContext child;
int ichild;
- bool print_to_stderr = true;
check_stack_depth();
Assert(MemoryContextIsValid(context));
if (print_location == PRINT_STATS_TO_STDERR)
- print_to_stderr = true;
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ }
else if (print_location == PRINT_STATS_TO_LOGS)
- print_to_stderr = false;
-
- if (print_location != PRINT_STATS_NONE)
{
+
/* Examine the context itself */
context->methods->stats(context,
MemoryContextStatsPrint,
&level,
- totals, print_to_stderr);
+ totals, false);
}
/*
* Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
- * only compute totals.
+ * only compute totals. This is used in reporting of memory context
+ * statistics via a sql function. Last parameter is not relevant.
*/
else
{
+ Assert(print_location == PRINT_STATS_NONE);
/* Examine the context itself */
context->methods->stats(context,
NULL,
NULL,
- totals, print_to_stderr);
+ totals, false);
}
- /* Increment the context count */
+ /* Increment the context count for each of the recursive call */
*num_contexts = *num_contexts + 1;
/*
@@ -969,9 +993,14 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
*num_contexts = *num_contexts + ichild;
- if (print_to_stderr)
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -984,7 +1013,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else if (print_location != PRINT_STATS_NONE)
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1321,6 +1350,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1402,506 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before children in the monitoring function output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared area.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this utility
+ * and the estimate of the size of statistics for each context.
+ * The remaining context statistics if any are captured as a cumulative total
+ * at the end of individual context's statistics.
+ *
+ * If get_summary is true, we display the level 1 and level 2 contexts. For that
+ * we traverse the memory context tree recursively in depth first search manner
+ * to cover all the children of a parent context, to be able to display a
+ * cumulative total of memory consumption by a parent at level 2.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (sizeof(MemoryContextEntry) + (MEM_CONTEXT_MAX_LEVEL
+ * sizeof(Datum)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE));
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, or by
+ * the previous execution of this function by this process, attach to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of all of their children upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after copying all the statistics
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ ereport(LOG,
+ errmsg("hash table corrupted, can't construct path value"));
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa memory */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ Datum *path_array;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strlcpy(name,
+ clipped_ident, idlen + 1);
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strlcpy(ident,
+ clipped_ident, idlen + 1);
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ /* Allocate dsa memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(Datum));
+ path_array = (Datum *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_array[foreach_current_index(i)] = Int32GetDatum(i);
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in dsa area.
+ */
+void
+AtProcExit_memstats_dsa_free(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ dsm_segment *dsm_seg = NULL;
+ dsa_area *area = NULL;
+
+ if (memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID)
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+ else
+ return;
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg == NULL)
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ else
+ return;
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+
+ dsa_detach(area);
+ LWLockRelease(&memCtxState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 890822eaf7..5dd9d920e8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8509,6 +8509,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 603d042435..d3c44df6e1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..5d4b2fbfc9 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..10fab7e580 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,10 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_SEGMENTS_PER_BACKEND 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +325,68 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Single DSA area is created and used by all the processes,
+ * each having its specific dsa allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+extern void AtProcExit_memstats_dsa_free(int code, Datum arg);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca2..dca20ae1a2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..4767351d4e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bfa276d2d3..78f5f1ec09 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1648,12 +1648,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v18-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v18-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From db2c1f522305b345038a8bcb9f73a9a49b044af9 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 91060de0ab..34cdcdf2fd 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
On 20 Mar 2025, at 08:39, Rahila Syed <rahilasyed90@gmail.com> wrote:
Thanks for the new version, I believe this will be a welcome tool in the
debugging toolbox.
I took a cleanup pass over the docs with among others the below changes:
* You had broken the text in paragraphs, but without <para/> tags they are
rendered as a single blob of text so added that.
* Removed the "(PID)" explanation as process id is used elsewhere on the same
page already without explanation.
* Added <productname/> markup on PostgreSQL
* Added <literal/> markup on paramater values
* Switched the example query output to use \x
* Added a note on when pg_backend_memory_contexts is a better choice
The paragraphs need some re-indenting to avoid too long lines, but I opted out
of doing so here to make reviewing the changes easier.
A few comments on the code (all comments are performed in 0003 attached here
which also has smaller cleanups wrt indentation, code style etc):
+#include <math.h>
I don't think we need this, maybe it was from an earlier version of the patch?
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
I wonder if this should really be "process" and not backend?
+ default:
+ context_type = "???";
+ break;
In ContextTypeToString() I'm having second thoughts about this, there shouldn't
be any legitimate use-case of passing a nodetag this function which would fail
MemoryContextIsValid(). I wonder if we aren't helping callers more by erroring
out rather than silently returning an unknown? I haven't changed this but
maybe we should to set the API right from the start?
+ * if the process has more memory contexts than that can fit in the allocated
s/than that can/than what can/?
+ errmsg("memory context statistics privilege error"));
Similar checks elsewhere in the tree mostly use "permission denied to .." so I
think we should adopt that here as well.
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
Since we only want to consider the stats if they are from the current process,
we can delay checking the time difference until after we've checked the pid and
thus reduce the amount of time we hold the lock in the error case.
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
Here we are really rechecking that the process is still alive, but I wonder if
we should take the opportunity to ensure that the type is what we expect it to
be? If the pid has moved from being a backend to an aux proc or vice versa we
really don't want to go on.
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
I wonder if we should differentiate between the warnings? When we hit this in
the loop the errmsg is describing a slightly different case. I did leave it
for now, but it's food for thought if we should perhaps reword this one.
+ ereport(LOG,
+ errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid));
This should probably by DEBUG1, in a congested cluster it might cause a fair
bit of logging which isn't really helping the user. Also, nitpick, errmsg
starts with a lowercase letter.
+static Size
+MemCtxShmemSize(void)
We don't really need this function anymore and keeping it separate we risk it
going out of sync with the matching calcuation in MemCtxBackendShmemInit, so I
think we should condense into one.
else
{
+ Assert(print_location == PRINT_STATS_NONE);
Rather than an if-then-else and an assert we can use a switch statement without
a default, this way we'll automatically get a warning if a value is missed.
+ ereport(LOG,
+ errmsg("hash table corrupted, can't construct path value"));
I know you switched from elog(LOG.. to ereport(LOG.. but I still think a LOG
entry stating corruption isn't helpful, it's not actionable for the user.
Given that it's a case that shouldn't happen I wonder if we should downgrade it
to an Assert(false) and potentially a DEBUG1?
--
Daniel Gustafsson
Attachments:
v19-0001-Preparatory-changes-for-reporting-memory-context.patchapplication/octet-stream; name=v19-0001-Preparatory-changes-for-reporting-memory-context.patch; x-unix-mode=0644Download
From ba6368a9db0e1f420897cca302ed2a80c94f2ca1 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:33:19 +0530
Subject: [PATCH v19 1/3] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 65 +++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 91060de0ab7..34cdcdf2fd3 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -876,18 +895,43 @@ static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
+ bool print_to_stderr = true;
+ check_stack_depth();
Assert(MemoryContextIsValid(context));
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ if (print_location == PRINT_STATS_TO_STDERR)
+ print_to_stderr = true;
+ else if (print_location == PRINT_STATS_TO_LOGS)
+ print_to_stderr = false;
+
+ if (print_location != PRINT_STATS_NONE)
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, print_to_stderr);
+ }
+
+ /*
+ * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
+ * only compute totals.
+ */
+ else
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, print_to_stderr);
+ }
+ /* Increment the context count */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +951,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -925,6 +969,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+ *num_contexts = *num_contexts + ichild;
if (print_to_stderr)
{
@@ -939,7 +984,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location != PRINT_STATS_NONE)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.39.3 (Apple Git-146)
v19-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v19-0002-Function-to-report-memory-context-statistics.patch; x-unix-mode=0644Download
From 991c7404073c4dfa4e2dd55e6f169ec2c9868ee4 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 3 Feb 2025 15:37:17 +0530
Subject: [PATCH v19 2/3] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 61 ++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 15 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 415 ++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 568 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 70 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 4 +
22 files changed, 1167 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6fa1d6586b8..1ab2ce12662 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28570,6 +28570,50 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ PostgreSQL process with the specified process ID (PID). It takes three
+ arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context statistics
+ are aggregated and a cumulative total is displayed. The num_agg_contexts
+ column indicates the number of contexts aggregated in the displayed
+ statistics. The num_agg_contexts value is typically 1, meaning that each
+ context's statistics are displayed separately.
+
+ When <parameter>get_summary</parameter> is set to true, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with num_agg_contexts indicating the number
+ of these aggregated child contexts.
+
+ When <parameter>get_summary</parameter> is set to false, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28688,6 +28732,23 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer')
+ , false, 5) LIMIT 1;
+ name | ident | type | path | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes | num_
+agg_contexts | stats_timestamp
+------------------+-------+----------+------+-------------+---------------+------------+-------------+------------+-----
+-------------+----------------------------------
+ TopMemoryContext | | AllocSet | {1} | 102664 | 4 | 3008 | 2 | 99656 |
+ 1 | 2025-03-04 10:01:57.590543+05:30
+</programlisting>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..16756152b71 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd2..cb7408acf4c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f871..c7a76711cc5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..362a6dc9528 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -343,6 +344,8 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7d201965503..b59034fdc38 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e4ca861a8e6..c50dcbc4914 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -497,6 +498,13 @@ InitProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the dsa memory occupied by the
+ * latest memory context statistics that could be published by this
+ * backend if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at backend exit.
*/
@@ -671,6 +679,13 @@ InitAuxiliaryProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the dsa memory occupied by the
+ * latest memory context statistics that could be published by this
+ * process if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at process exit.
*/
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 4d2edb10658..08d17a19316 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3531,6 +3531,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9fa12a555e8..e014e895bfb 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -160,6 +160,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..462c4e48cf0 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +312,353 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on dsa memory that could be allocated per process -
+ * if the process has more memory contexts than that can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing
+ * backend.
+ * Once condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any. If previous statistics
+ * do not exist, return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("memory context statistics privilege error"));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process",
+ pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a postgresql process, informing it we want it to
+ * produce information about memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for max_tries *
+ * MEMSTATS_WAIT_TIMEOUT, following which display old statistics if
+ * available or return NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ ereport(LOG,
+ errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid));
+ num_retries = num_retries + 1;
+ }
+
+ }
+ /* We should land here only with a valid DSA handle */
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ memctx_info = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum_array;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
+ path_array = construct_array_builtin(path_datum_array,
+ path_length, INT4OID);
+
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+ values[4] = Int32GetDatum(path_length); /* level */
+ values[5] = Int64GetDatum(memctx_info[i].totalspace);
+ values[6] = Int64GetDatum(memctx_info[i].nblocks);
+ values[7] = Int64GetDatum(memctx_info[i].freespace);
+ values[8] = Int64GetDatum(memctx_info[i].freechunks);
+ values[9] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[10] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Shared memory sizing for reporting memory context information.
+ */
+static Size
+MemCtxShmemSize(void)
+{
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs =
+ add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+
+ memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
+ MemCtxShmemSize(),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory
+ * context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
+ &found);
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdaef..13938ccb0f5 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 34cdcdf2fd3..214330aa7a5 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,10 +19,18 @@
*-------------------------------------------------------------------------
*/
+#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -177,6 +185,16 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
/*
* You should not do memory allocations within a critical section, because
@@ -889,7 +907,8 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
@@ -899,38 +918,43 @@ MemoryContextStatsInternal(MemoryContext context, int level,
{
MemoryContext child;
int ichild;
- bool print_to_stderr = true;
check_stack_depth();
Assert(MemoryContextIsValid(context));
if (print_location == PRINT_STATS_TO_STDERR)
- print_to_stderr = true;
+ {
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ }
else if (print_location == PRINT_STATS_TO_LOGS)
- print_to_stderr = false;
-
- if (print_location != PRINT_STATS_NONE)
{
+
/* Examine the context itself */
context->methods->stats(context,
MemoryContextStatsPrint,
&level,
- totals, print_to_stderr);
+ totals, false);
}
/*
* Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
- * only compute totals.
+ * only compute totals. This is used in reporting of memory context
+ * statistics via a sql function. Last parameter is not relevant.
*/
else
{
+ Assert(print_location == PRINT_STATS_NONE);
/* Examine the context itself */
context->methods->stats(context,
NULL,
NULL,
- totals, print_to_stderr);
+ totals, false);
}
- /* Increment the context count */
+ /* Increment the context count for each of the recursive call */
*num_contexts = *num_contexts + 1;
/*
@@ -969,9 +993,14 @@ MemoryContextStatsInternal(MemoryContext context, int level,
ichild++;
child = MemoryContextTraverseNext(child, context);
}
+
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
*num_contexts = *num_contexts + ichild;
- if (print_to_stderr)
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -984,7 +1013,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else if (print_location != PRINT_STATS_NONE)
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1321,6 +1350,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1358,6 +1402,506 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before children in the monitoring function output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared area.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this utility
+ * and the estimate of the size of statistics for each context.
+ * The remaining context statistics if any are captured as a cumulative total
+ * at the end of individual context's statistics.
+ *
+ * If get_summary is true, we display the level 1 and level 2 contexts. For that
+ * we traverse the memory context tree recursively in depth first search manner
+ * to cover all the children of a parent context, to be able to display a
+ * cumulative total of memory consumption by a parent at level 2.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContext stat_cxt;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Make a new context that will contain the hash table, to ease the
+ * cleanup.
+ */
+ stat_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "Memory context statistics",
+ ALLOCSET_DEFAULT_SIZES);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = stat_cxt;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (sizeof(MemoryContextEntry) + (MEM_CONTEXT_MAX_LEVEL
+ * sizeof(Datum)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE));
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, or by
+ * the previous execution of this function by this process, attach to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of all of their children upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* No aggregated contexts, individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after copying all the statistics
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ /* Delete the hash table memory context */
+ MemoryContextDelete(stat_cxt);
+
+ dsa_detach(area);
+}
+
+/*
+ * Append the transient context_id of this context and each of
+ * its ancestors to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ ereport(LOG,
+ errmsg("hash table corrupted, can't construct path value"));
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa memory */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ Datum *path_array;
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strlcpy(name,
+ clipped_ident, idlen + 1);
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strlcpy(ident,
+ clipped_ident, idlen + 1);
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ /* Allocate dsa memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(Datum));
+ path_array = (Datum *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_array[foreach_current_index(i)] = Int32GetDatum(i);
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in dsa area.
+ */
+void
+AtProcExit_memstats_dsa_free(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ dsm_segment *dsm_seg = NULL;
+ dsa_area *area = NULL;
+
+ if (memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID)
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+ else
+ return;
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg == NULL)
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ else
+ return;
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+
+ dsa_detach(area);
+ LWLockRelease(&memCtxState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0d29ef50ff2..e2957c3906d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8509,6 +8509,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 603d0424354..d3c44df6e13 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed933..5d4b2fbfc9c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..10fab7e5804 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,10 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_SEGMENTS_PER_BACKEND 8
/*
* Standard top-level memory contexts.
*
@@ -319,4 +325,68 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Single DSA area is created and used by all the processes,
+ * each having its specific dsa allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+extern void AtProcExit_memstats_dsa_free(int code, Datum arg);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..dca20ae1a26 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..4767351d4e2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3fbf5a4c212..6cd7a30e0be 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1649,12 +1649,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.39.3 (Apple Git-146)
v19-0003-Review-comments-and-fixups.patchapplication/octet-stream; name=v19-0003-Review-comments-and-fixups.patch; x-unix-mode=0644Download
From d9e8a1488472db17bcb3b275488232e5db94d018 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Tue, 25 Mar 2025 13:47:52 +0100
Subject: [PATCH v19 3/3] Review comments and fixups
---
doc/src/sgml/func.sgml | 62 +++++++++++-------
src/backend/utils/adt/mcxtfuncs.c | 103 +++++++++++++++---------------
src/backend/utils/mmgr/mcxt.c | 78 +++++++++++-----------
3 files changed, 131 insertions(+), 112 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab2ce12662..f8a3b9d0cf6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28580,34 +28580,36 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para>
<para>
This function handles requests to display the memory contexts of a
- PostgreSQL process with the specified process ID (PID). It takes three
- arguments: <parameter>PID</parameter>, <parameter>get_summary</parameter>
- and <parameter>num_of_tries</parameter>. The function can send requests
+ <productname>PostgreSQL</productname> process with the specified process ID. It takes three
+ arguments: <parameter>pid</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
to both backend and auxiliary processes.
-
+ </para>
+ <para>
After receiving memory context statistics from the target process, it
- returns the results as one row per context. If all the contexts don't
+ returns the results as one row per context. If all the contexts don't
fit within the pre-determined size limit, the remaining context statistics
- are aggregated and a cumulative total is displayed. The num_agg_contexts
+ are aggregated and a cumulative total is displayed. The <literal>num_agg_contexts</literal>
column indicates the number of contexts aggregated in the displayed
- statistics. The num_agg_contexts value is typically 1, meaning that each
+ statistics. The <literal>num_agg_contexts</literal> value is typically 1, meaning that each
context's statistics are displayed separately.
-
- When <parameter>get_summary</parameter> is set to true, statistics
+ </para>
+ <para>
+ When <parameter>get_summary</parameter> is set to <literal>true</literal>, statistics
for memory contexts at levels 1 and 2 are displayed, with level 1
representing the root node (i.e., TopMemoryContext).
Each level 2 context's statistics represent an aggregate of all its
- child contexts' statistics, with num_agg_contexts indicating the number
+ child contexts' statistics, with <literal>num_agg_contexts</literal> indicating the number
of these aggregated child contexts.
-
- When <parameter>get_summary</parameter> is set to false, the
+ When <parameter>get_summary</parameter> is set to <literal>false</literal>, the
num_agg_contexts value is 1, indicating that individual statistics are
being displayed.
-
+ </para>
+ <para>
<parameter>num_of_tries</parameter> indicates the number of times
- the client will wait for the latest statistics. The wait per try is 1
- second. This parameter can be increased if the user anticipates a delay
- in the response from the reporting process. Conversely, if users are
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
frequently and periodically querying the process for statistics, or if
there are concurrent requests for statistics of the same process,
lowering the parameter might help achieve a faster response.
@@ -28740,13 +28742,29 @@ postgres=# SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE backend_type = 'checkpointer')
, false, 5) LIMIT 1;
- name | ident | type | path | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes | num_
-agg_contexts | stats_timestamp
-------------------+-------+----------+------+-------------+---------------+------------+-------------+------------+-----
--------------+----------------------------------
- TopMemoryContext | | AllocSet | {1} | 102664 | 4 | 3008 | 2 | 99656 |
- 1 | 2025-03-04 10:01:57.590543+05:30
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
</para>
</sect2>
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 462c4e48cf0..f9145b9faaa 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -330,7 +330,7 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* to a dynamic shared memory space.
*
* We have defined a limit on dsa memory that could be allocated per process -
- * if the process has more memory contexts than that can fit in the allocated
+ * if the process has more memory contexts than what can fit in the allocated
* size, the excess contexts are summarized and represented as cumulative total
* at the end of the buffer.
*
@@ -354,6 +354,7 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
bool get_summary = PG_GETARG_BOOL(1);
PGPROC *proc;
ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
dsa_area *area;
MemoryContextEntry *memctx_info;
@@ -368,25 +369,25 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
ereport(ERROR,
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("memory context statistics privilege error"));
+ errmsg("permission denied to extract memory context statistics"));
InitMaterializedSRF(fcinfo, 0);
/*
- * See if the process with given pid is a backend or an auxiliary process.
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
*/
proc = BackendPidGetProc(pid);
if (proc == NULL)
+ {
proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
/*
* BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
- * isn't valid; but by the time we reach kill(), a process for which we
- * get a valid proc here might have terminated on its own. There's no way
- * to acquire a lock on an arbitrary process to prevent that. But since
- * this mechanism is usually used to debug a backend or an auxiliary
- * process running and consuming lots of memory, that it might end on its
- * own first and its memory contexts are not logged is not a problem.
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
*/
if (proc == NULL)
{
@@ -395,8 +396,7 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
* if one backend terminated on its own during the run.
*/
ereport(WARNING,
- (errmsg("PID %d is not a PostgreSQL server process",
- pid)));
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
PG_RETURN_NULL();
}
@@ -409,8 +409,8 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
curr_timestamp = GetCurrentTimestamp();
/*
- * Send a signal to a postgresql process, informing it we want it to
- * produce information about memory contexts.
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
*/
if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
{
@@ -442,9 +442,6 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
* process to finish publishing statistics.
*/
LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
- msecs =
- TimestampDifferenceMilliseconds(curr_timestamp,
- memCtxState[procNumber].stats_timestamp);
/*
* Note in procnumber.h file says that a procNumber can be re-used for
@@ -460,29 +457,38 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
* statistics timestamp being newer than the current request
* timestamp.
*/
+ msecs = TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
&& msecs > 0)
break;
-
}
LWLockRelease(&memCtxState[procNumber].lw_lock);
/*
* Recheck the state of the backend before sleeping on the condition
- * variable
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
*/
- proc = BackendPidGetProc(pid);
-
-#define MEMSTATS_WAIT_TIMEOUT 1000
- if (proc == NULL)
+ if (proc_is_aux)
proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
if (proc == NULL)
{
ereport(WARNING,
- errmsg("PID %d is not a PostgreSQL server process",
- pid));
+ errmsg("PID %d is not a PostgreSQL server process", pid));
PG_RETURN_NULL();
}
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+
if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
MEMSTATS_WAIT_TIMEOUT,
WAIT_EVENT_MEM_CTX_PUBLISH))
@@ -503,12 +509,11 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
}
- ereport(LOG,
- errmsg("Wait for %d process to publish stats timed out, trying again",
+ ereport(DEBUG1,
+ errmsg("timed out waiting for process with PID %d to publish stats, retrying",
pid));
num_retries = num_retries + 1;
}
-
}
/* We should land here only with a valid DSA handle */
Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
@@ -519,8 +524,8 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
*
* Read statistics of top level 1 and 2 contexts, if get_summary is true.
*/
- memctx_info = (MemoryContextEntry *) dsa_get_address(area,
- memCtxState[procNumber].memstats_dsa_pointer);
+ memctx_info = (MemoryContextEntry *)
+ dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
@@ -543,6 +548,7 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
}
else
nulls[0] = true;
+
if (DsaPointerIsValid(memctx_info[i].ident))
{
ident = (char *) dsa_get_address(area, memctx_info[i].ident);
@@ -563,11 +569,11 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
path_datum_array = (Datum *) dsa_get_address(area, memctx_info[i].path);
path_array = construct_array_builtin(path_datum_array,
path_length, INT4OID);
-
values[3] = PointerGetDatum(path_array);
}
else
nulls[3] = true;
+
values[4] = Int32GetDatum(path_length); /* level */
values[5] = Int64GetDatum(memctx_info[i].totalspace);
values[6] = Int64GetDatum(memctx_info[i].nblocks);
@@ -589,18 +595,6 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
-/*
- * Shared memory sizing for reporting memory context information.
- */
-static Size
-MemCtxShmemSize(void)
-{
- Size TotalProcs =
- add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
-
- return mul_size(TotalProcs, sizeof(MemoryContextBackendState));
-}
-
/*
* Init shared memory for reporting memory context information.
*/
@@ -608,12 +602,16 @@ void
MemCtxBackendShmemInit(void)
{
bool found;
- Size TotalProcs =
- add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+ Size TotalProcs;
+
+ TotalProcs = add_size(MaxBackends, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, max_prepared_xacts);
+
+ memCtxState = (MemoryContextBackendState *)
+ ShmemInitStruct("MemoryContextBackendState",
+ mul_size(TotalProcs, sizeof(MemoryContextBackendState)),
+ &found);
- memCtxState = (MemoryContextBackendState *) ShmemInitStruct("MemoryContextBackendState",
- MemCtxShmemSize(),
- &found);
if (!IsUnderPostmaster)
{
Assert(!found);
@@ -622,8 +620,7 @@ MemCtxBackendShmemInit(void)
{
ConditionVariableInit(&memCtxState[i].memctx_cv);
- LWLockInitialize(&memCtxState[i].lw_lock,
- LWLockNewTrancheId());
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
"mem_context_backend_stats_reporting");
@@ -637,16 +634,16 @@ MemCtxBackendShmemInit(void)
}
/*
- * Initialize shared memory for displaying memory
- * context statistics
+ * Initialize shared memory for displaying memory context statistics
*/
void
MemCtxShmemInit(void)
{
bool found;
- memCtxArea = (MemoryContextState *) ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState),
- &found);
+ memCtxArea = (MemoryContextState *)
+ ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found);
+
if (!IsUnderPostmaster)
{
Assert(!found);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 214330aa7a5..4db48230241 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -19,7 +19,6 @@
*-------------------------------------------------------------------------
*/
-#include <math.h>
#include "postgres.h"
#include "mb/pg_wchar.h"
@@ -922,38 +921,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
check_stack_depth();
Assert(MemoryContextIsValid(context));
- if (print_location == PRINT_STATS_TO_STDERR)
- {
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, true);
- }
- else if (print_location == PRINT_STATS_TO_LOGS)
+ switch (print_location)
{
+ case PRINT_STATS_TO_STDERR:
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
- /* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, false);
- }
+ case PRINT_STATS_TO_LOGS:
+ /* Examine the context itself */
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
- /*
- * Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
- * only compute totals. This is used in reporting of memory context
- * statistics via a sql function. Last parameter is not relevant.
- */
- else
- {
- Assert(print_location == PRINT_STATS_NONE);
- /* Examine the context itself */
- context->methods->stats(context,
- NULL,
- NULL,
- totals, false);
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
}
+
/* Increment the context count for each of the recursive call */
*num_contexts = *num_contexts + 1;
@@ -1601,6 +1601,7 @@ ProcessGetMemoryContextInterrupt(void)
memCtxState[idx].total_stats = ctx_id;
goto cleanup;
}
+
foreach_ptr(MemoryContextData, cur, contexts)
{
List *path = NIL;
@@ -1665,6 +1666,7 @@ ProcessGetMemoryContextInterrupt(void)
*/
memCtxState[idx].total_stats = num_individual_stats + 1;
}
+
cleanup:
/*
@@ -1680,8 +1682,8 @@ cleanup:
}
/*
- * Append the transient context_id of this context and each of
- * its ancestors to a list, in order to compute a path.
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
*/
static List *
compute_context_path(MemoryContext c, HTAB *context_id_lookup)
@@ -1763,6 +1765,8 @@ PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
char *ident;
Datum *path_array;
+ Assert(MemoryContextIsValid(context));
+
if (context->name != NULL)
{
Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
@@ -1817,6 +1821,7 @@ PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
}
else
memctx_info[curr_id].ident = InvalidDsaPointer;
+
/* Allocate dsa memory for storing path information */
if (path == NIL)
memctx_info[curr_id].path = InvalidDsaPointer;
@@ -1869,11 +1874,11 @@ AtProcExit_memstats_dsa_free(int code, Datum arg)
dsm_segment *dsm_seg = NULL;
dsa_area *area = NULL;
- if (memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID)
- dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
- else
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
return;
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+
LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
@@ -1883,10 +1888,9 @@ AtProcExit_memstats_dsa_free(int code, Datum arg)
}
/* If the dsm mapping could not be found, attach to the area */
- if (dsm_seg == NULL)
- area = dsa_attach(memCtxArea->memstats_dsa_handle);
- else
+ if (dsm_seg != NULL)
return;
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
/*
* Free the memory context statistics, free the name, ident and path
--
2.39.3 (Apple Git-146)
Hi Daniel,
Thank you for your review.
I have incorporated all your changes in v20 patches and ensured that the
review comments
corresponding to 0001 patch are included in that patch and not in 0002.
+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
I wonder if this should really be "process" and not backend?Fixed.
+ default:
+ context_type = "???";
+ break;
In ContextTypeToString() I'm having second thoughts about this, there
shouldn't
be any legitimate use-case of passing a nodetag this function which would
fail
MemoryContextIsValid(). I wonder if we aren't helping callers more by
erroring
out rather than silently returning an unknown? I haven't changed this but
maybe we should to set the API right from the start?
I cannot think of any legitimate scenario where the context type would be
unknown.
However, if we were to throw an error, it would prevent us from reporting
any memory
usage information when the context type is unidentified. Perhaps, it would
be more
informative and less restrictive to label it as "Unrecognized" or "Unknown."
I wonder if this was the reasoning behind doing it when it was added with
the
pg_backend_memory_contexts() function.
+ /* + * Recheck the state of the backend before sleeping on the condition + * variable + */ + proc = BackendPidGetProc(pid); Here we are really rechecking that the process is still alive, but I wonder if we should take the opportunity to ensure that the type is what we expect it to be? If the pid has moved from being a backend to an aux proc or vice versa we really don't want to go on.
The reasoning makes sense to me. For periodic monitoring of all processes,
any PID that reincarnates into a different type could be queried in
subsequent
function calls. Regarding targeted monitoring of a specific process, such a
reincarnated
process would exhibit a completely different memory consumption,
likely not aligning with the user's original intent behind requesting the
statistics.
+ ereport(WARNING, + errmsg("PID %d is not a PostgreSQL server process", + pid)); I wonder if we should differentiate between the warnings? When we hit this in the loop the errmsg is describing a slightly different case. I did leave it for now, but it's food for thought if we should perhaps reword this one.
Changed it to "PID %d is no longer the same PostgreSQL server process".
+ ereport(LOG, + errmsg("hash table corrupted, can't construct path value")); I know you switched from elog(LOG.. to ereport(LOG.. but I still think a LOG entry stating corruption isn't helpful, it's not actionable for the user. Given that it's a case that shouldn't happen I wonder if we should downgrade it to an Assert(false) and potentially a DEBUG1?How about changing it to ERROR, in accordance with current occurrences of
the
same message? I did it in the attached version, however I am open to
changing
it to an Assert(false) and DEBUG1.
Apart from the above, I made the following improvements.
1. Eliminated the unnecessary creation of an extra memory context before
calling hash_create.
The hash_create function already generates a memory context containing the
hash table,
enabling easy memory deallocation by simply deleting the context via
hash_destroy.
Therefore, the patch relies on hash_destroy for memory management instead
of manual freeing.
2. Optimized memory usage by storing the path as an array of integers
rather than as an array of
Datums.
This approach conserves DSA memory allocated for storing this information.
3. Miscellaneous comment cleanups and introduced macros to simplify
calculations.
Thank you,
Rahila Syed
Attachments:
v20-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v20-0002-Function-to-report-memory-context-statistics.patchDownload
From 3416b7238a22a1cb70036184f3e589280d92695e Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Wed, 26 Mar 2025 15:08:16 +0530
Subject: [PATCH 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 79 +++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 15 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 415 +++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 529 ++++++++++++++++++
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 72 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 4 +
22 files changed, 1160 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f8c1deb04e..cfe4badb83 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28570,6 +28570,52 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>get_summary</parameter> <type>boolean</type>, <parameter>num_of_tries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified process ID. It takes three
+ arguments: <parameter>pid</parameter>, <parameter>get_summary</parameter>
+ and <parameter>num_of_tries</parameter>. The function can send requests
+ to both backend and auxiliary processes.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context statistics
+ are aggregated and a cumulative total is displayed. The <literal>num_agg_contexts</literal>
+ column indicates the number of contexts aggregated in the displayed
+ statistics. The <literal>num_agg_contexts</literal> value is typically 1, meaning that each
+ context's statistics are displayed separately.
+ </para>
+ <para>
+ When <parameter>get_summary</parameter> is set to <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., TopMemoryContext).
+ Each level 2 context's statistics represent an aggregate of all its
+ child contexts' statistics, with <literal>num_agg_contexts</literal> indicating the number
+ of these aggregated child contexts.
+ When <parameter>get_summary</parameter> is set to <literal>false</literal>, the
+ num_agg_contexts value is 1, indicating that individual statistics are
+ being displayed.
+ </para>
+ <para>
+ <parameter>num_of_tries</parameter> indicates the number of times
+ the client will wait for the latest statistics. The wait per try is 1
+ second. This parameter can be increased if the user anticipates a delay
+ in the response from the reporting process. Conversely, if users are
+ frequently and periodically querying the process for statistics, or if
+ there are concurrent requests for statistics of the same process,
+ lowering the parameter might help achieve a faster response.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28688,6 +28734,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer')
+ , false, 5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a..16756152b7 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1c..d3cb3f1891 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906e..f24f574e74 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd..cb7408acf4 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393..7149a67fcb 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f87..c7a76711cc 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0..362a6dc952 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -343,6 +344,8 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7d20196550..b59034fdc3 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e4ca861a8e..c50dcbc491 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -497,6 +498,13 @@ InitProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the dsa memory occupied by the
+ * latest memory context statistics that could be published by this
+ * backend if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at backend exit.
*/
@@ -671,6 +679,13 @@ InitAuxiliaryProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the dsa memory occupied by the
+ * latest memory context statistics that could be published by this
+ * process if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at process exit.
*/
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 4d2edb1065..08d17a1931 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3531,6 +3531,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 9fa12a555e..746496d122 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -160,6 +160,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b..83f513ad61 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,32 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +312,353 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on dsa memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing
+ * backend.
+ * Once condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any. If previous statistics
+ * do not exist, return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_tries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("permission denied to extract memory context statistics"));
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for max_tries *
+ * MEMSTATS_WAIT_TIMEOUT, following which display old statistics if
+ * available or return NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid dsa
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ msecs = TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer the same PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+#define MEMSTATS_WAIT_TIMEOUT 1000
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ /*
+ * Wait for max_tries defined by user, display previously
+ * published statistics if any, when max_tries are over.
+ */
+ if (num_retries > max_tries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ ereport(DEBUG1,
+ errmsg("timed out waiting for process with PID %d to publish stats, retrying",
+ pid));
+ num_retries = num_retries + 1;
+ }
+ }
+ /* We should land here only with a valid DSA handle */
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Backend has finished publishing the stats, read them
+ *
+ * Read statistics of top level 1 and 2 contexts, if get_summary is true.
+ */
+ memctx_info = (MemoryContextEntry *)
+ dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum = NULL;
+ int *path_int = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_int = (int *) dsa_get_address(area, memctx_info[i].path);
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(path_int[j]);
+ path_array = construct_array_builtin(path_datum,
+ path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(path_length); /* level */
+ values[5] = Int64GetDatum(memctx_info[i].totalspace);
+ values[6] = Int64GetDatum(memctx_info[i].nblocks);
+ values[7] = Int64GetDatum(memctx_info[i].freespace);
+ values[8] = Int64GetDatum(memctx_info[i].freechunks);
+ values[9] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[10] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs;
+
+ TotalProcs = add_size(MaxBackends, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, max_prepared_xacts);
+
+ memCtxState = (MemoryContextBackendState *)
+ ShmemInitStruct("MemoryContextBackendState",
+ mul_size(TotalProcs, sizeof(MemoryContextBackendState)),
+ &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *)
+ ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdae..13938ccb0f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 05af956930..3979e121ea 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -177,6 +184,16 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
/*
* You should not do memory allocations within a critical section, because
@@ -1331,6 +1348,21 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating publishing of memory
+ * contexts.
+ *
+ * All the actual work is deferred to ProcessLogMemoryContextInterrupt()
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1368,6 +1400,503 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared area.
+ * Statistics written by each process are tracked independently in per-process
+ * dsa pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this utility
+ * maximum size of statistics for each context.
+ * The remaining context statistics if any are captured as a cumulative total
+ * at the end of individual context's statistics.
+ *
+ * If get_summary is true, we capture the level 1 and level 2 contexts statistics.
+ * For that we traverse the memory context tree recursively in depth first search
+ * manner to cover all the children of a parent context, to be able to display a
+ * cumulative total of memory consumption by a parent at level 2 and all its
+ * children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (MAX_MEMORY_CONTEXT_STATS_SIZE);
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's dsa for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the dsa area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that a waiting
+ * client gets the stats even after a process exits.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, or by
+ * the previous execution of this function by this process, attach to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer = dsa_allocate0(area,
+ stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of all of their children upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* Statistics are not aggregated, i.e individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after copying all the statistics
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ hash_destroy(context_id_lookup);
+ dsa_detach(area);
+}
+
+/*
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ ereport(ERROR,
+ errmsg("hash table corrupted, can't construct path value"));
+ break;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level(from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+
+}
+
+/* Copy the memory context statistics of a single context to a dsa memory */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ int *path_list;
+
+ Assert(MemoryContextIsValid(context));
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strncpy(name, context->name, strlen(context->name));
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (!strncmp(context->name, "dynahash", 8))
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ name = (char *) dsa_get_address(area,
+ memctx_info[curr_id].name);
+ strlcpy(name,
+ clipped_ident, idlen + 1);
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+
+ memctx_info[curr_id].ident = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
+ ident = (char *) dsa_get_address(area,
+ memctx_info[curr_id].ident);
+ strlcpy(ident,
+ clipped_ident, idlen + 1);
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+
+ /* Allocate dsa memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length
+ * sizeof(int));
+ path_list = (int *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_list[foreach_current_index(i)] = i;
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in dsa area.
+ */
+void
+AtProcExit_memstats_dsa_free(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ dsm_segment *dsm_seg = NULL;
+ dsa_area *area = NULL;
+
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ return;
+
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg != NULL)
+ return;
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+
+ dsa_detach(area);
+ LWLockRelease(&memCtxState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3f7b82e02b..760d345820 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8509,6 +8509,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{oid, summary, num_of_tries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 603d042435..d3c44df6e1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed93..5d4b2fbfc9 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce..ecbdd69597 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_SEGMENTS_PER_BACKEND 8
+#define MEM_CONTEXT_PATH_SIZE MEM_CONTEXT_MAX_LEVEL * sizeof(int)
+#define MAX_MEMORY_CONTEXT_STATS_SIZE sizeof(MemoryContextEntry) + MEM_CONTEXT_PATH_SIZE + 2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,68 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Static shared memory state representing the DSA area
+ * created for memory context statistics reporting.
+ * Single DSA area is created and used by all the processes,
+ * each having its specific dsa allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory
+ * context statistics reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * MemoryContextId
+ * Used for storage of transient identifiers for
+ * pg_get_backend_memory_contexts.
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+extern void AtProcExit_memstats_dsa_free(int code, Datum arg);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca2..dca20ae1a2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b..4767351d4e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3fbf5a4c21..6cd7a30e0b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1649,12 +1649,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v20-0001-Preparatory-changes-for-reporting-memory-context-sta.patchapplication/octet-stream; name=v20-0001-Preparatory-changes-for-reporting-memory-context-sta.patchDownload
From 4d534f76678f0eb0fce00d68c520773d18d3295f Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Wed, 26 Mar 2025 14:38:39 +0530
Subject: [PATCH 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 77 ++++++++++++++++++++++++++++++-----
1 file changed, 66 insertions(+), 11 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 91060de0ab..05af956930 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +889,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +904,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
++ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +956,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +975,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -939,7 +994,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.34.1
On 26 Mar 2025, at 11:34, Rahila Syed <rahilasyed90@gmail.com> wrote:
+ ereport(LOG, + errmsg("hash table corrupted, can't construct path value")); I know you switched from elog(LOG.. to ereport(LOG.. but I still think a LOG entry stating corruption isn't helpful, it's not actionable for the user. Given that it's a case that shouldn't happen I wonder if we should downgrade it to an Assert(false) and potentially a DEBUG1?How about changing it to ERROR, in accordance with current occurrences of the
same message? I did it in the attached version, however I am open to changing
it to an Assert(false) and DEBUG1.
In the attached I moved it to an elog() as it's an internal error, and spending
translation effort on it seems fruitless.
1. Eliminated the unnecessary creation of an extra memory context before calling hash_create.
The hash_create function already generates a memory context containing the hash table,
enabling easy memory deallocation by simply deleting the context via hash_destroy.
Therefore, the patch relies on hash_destroy for memory management instead of manual freeing.
Nice
2. Optimized memory usage by storing the path as an array of integers rather than as an array of
Datums.
This approach conserves DSA memory allocated for storing this information.
Ah yes, much better.
The attached v21 has a few improvements:
* The function documentation didn't specify the return type, only the fact that
it's setof record. I've added all output columns.
* Some general cleaups of the docs with better markup, improved xref linking
and various rewording.
* Comment cleanups and language alignment
* Added a missing_ok parameter to ContextTypeToString(). While all callers are
fine with unknown context types, if we introduce an API for this it seems
prudent to not place that burden on callers but to take it on in the function.
* Renamed get_summary to just summary, and num_of_tries to retries which feels
more in line with the naming convention in other functions
* Deferred calling InitMaterializedSRF() until after the PID has been checked
for validity.
* Pulled back the timeout to 500msec from 1 second. In running congested
pgbench simulations I saw better performance and improved results in getting stats.
* Replaced strncpy with strlcpy and consistently used idlen to keep all length
calculations equal.
* Fixed misspelled param name in pg_proc.dat
* Pulled back maximum memory usage from 8Mb to 1Mb. 8Mb for the duration of a
process (once allocated) is a lot for a niche feature and I while I'm still not
sure 1Mb is the right value I think from experimentation that it's closer.
I think this version is close to a committable state, will spend a little more
time testing, polishing and rewriting the commit message. I will also play
around with placement within the memory context code files to keep it from
making backpatch issues.
--
Daniel Gustafsson
Attachments:
v21-0002-Function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v21-0002-Function-to-report-memory-context-statistics.patch; x-unix-mode=0644Download
From ddaf0ff1fdc7f0b68743bb8676d04ca7c2c63432 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Wed, 26 Mar 2025 15:08:16 +0530
Subject: [PATCH v21 2/2] Function to report memory context statistics
This function sends a signal to a backend to publish
statistics of all its memory contexts. Signal handler
running in the backend process, sets a flag, which causes
it to copy its MemoryContextStats to a DSA, during the
next call to CHECK_FOR_INTERRUPTS().
If there are more statistics than that fit in 16MB, the
remaining statistics are copied as a cumulative
total of the remaining contexts.
Once its done, it signals the client backend using
a condition variable. The client backend wakes up, reads
the shared memory and returns these values in the form
of set of records, one for each memory context, to the
user, followed by a cumulative total of the remaining
contexts, if any.
If get_summary is true return statistics of all children
of TopMemoryContext with aggregated statistics of their
children.
User can pass num_of_tries which determines the total
number of wait cycles in a client backend for latest
statistics.
Each cycle wait timeout is set to 1 seconds. Post this
the client displays previously published statistics or
returns without results.
Each backend and auxiliary process has its own slot for
reporting the stats. There is an array of such memory slots
of size MaxBackends+NumofAuxiliary
processes in fixed shared memory. Each of these slots point
to a smaller dsa allocations within a single DSA,
which contains the stats to be shared by the corresponding
process.
Each slot has its own LW lock and condition variable for
synchronization and communication between the publishing
process and the client backend.
---
doc/src/sgml/func.sgml | 94 ++++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 15 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 432 +++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 531 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 69 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 4 +
22 files changed, 1190 insertions(+), 30 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2488e9ba998..b163da128c9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28645,6 +28645,67 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>retries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ Memory contexts are arranged in a tree-like hierarchy. When
+ <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>retries</parameter> sets the number of times the client will
+ retry to get updated statistics. The sleep per try is
+ <literal>0.5</literal> seconds. This parameter can be increased if the
+ user anticipates a delay in the response from the reporting process.
+ Conversely, if users are frequently and periodically querying the
+ process for statistics, or if there are concurrent requests for
+ statistics of the same process, lowering the parameter might help
+ achieve a faster response.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> is means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28784,6 +28845,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer')
+ , false, 5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..16756152b71 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd2..cb7408acf4c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f871..c7a76711cc5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..362a6dc9528 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -343,6 +344,8 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7d201965503..b59034fdc38 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 066319afe2b..026c4bc992f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -497,6 +498,13 @@ InitProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this
+ * backend if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at backend exit.
*/
@@ -671,6 +679,13 @@ InitAuxiliaryProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this
+ * process if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at process exit.
*/
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index aec65007bb6..ad020adee93 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3531,6 +3531,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8bce14c38fd..ee0e42535b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..375509f2551 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type, true);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,44 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid. If missing_ok is
+ * false then execution will error out on invalid context types.
+ */
+const char *
+ContextTypeToString(NodeTag type, bool missing_ok)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ if (missing_ok)
+ context_type = "???";
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("invalid memory context type specified"));
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +324,358 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing backend.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_retries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_retries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process.
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("permission denied to extract memory context statistics"));
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for max_retries *
+ * MEMSTATS_WAIT_TIMEOUT, following which display old statistics if
+ * available or return NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid DSA
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ msecs = TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+#define MEMSTATS_WAIT_TIMEOUT 500
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ /*
+ * Wait for max_retries, as defined by the user. If no updated
+ * statistics are available within the wait time defined by
+ * max_retries then display previously published statistics if
+ * there are any. If no previous statistics are available then
+ * return NULL.
+ */
+ if (num_retries > max_retries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ ereport(DEBUG1,
+ errmsg("timed out waiting for process with PID %d to publish stats, retrying",
+ pid));
+ num_retries = num_retries + 1;
+ }
+ }
+
+ /*
+ * We should only reach here with a valid DSA handle, either containing
+ * updated statistics or previously published statistics (identified by
+ * the timestamp.
+ */
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memctx_info = (MemoryContextEntry *)
+ dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum = NULL;
+ int *path_int = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_int = (int *) dsa_get_address(area, memctx_info[i].path);
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(path_int[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(path_length); /* level */
+ values[5] = Int64GetDatum(memctx_info[i].totalspace);
+ values[6] = Int64GetDatum(memctx_info[i].nblocks);
+ values[7] = Int64GetDatum(memctx_info[i].freespace);
+ values[8] = Int64GetDatum(memctx_info[i].freechunks);
+ values[9] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[10] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs;
+
+ TotalProcs = add_size(MaxBackends, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, max_prepared_xacts);
+
+ memCtxState = (MemoryContextBackendState *)
+ ShmemInitStruct("MemoryContextBackendState",
+ mul_size(TotalProcs, sizeof(MemoryContextBackendState)),
+ &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *)
+ ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b844f9fdaef..13938ccb0f5 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,6 +38,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 2cbde8f39c3..bfcd65a69a5 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -177,6 +184,16 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats, dsa_pointer prev_dsa_pointer);
/*
* You should not do memory allocations within a critical section, because
@@ -927,7 +944,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
* PRINT_STATS_NONE, only compute totals. This is used in
* reporting of memory context statistics via a sql function. Last
* parameter is not relevant.
-+ */
+ */
context->methods->stats(context,
NULL,
NULL,
@@ -1331,6 +1348,22 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1368,6 +1401,502 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Statistics written by each process are tracked independently in
+ * per-process DSA pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility maximum size of statistics for each context. The remaining context
+ * statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If get_summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (MAX_MEMORY_CONTEXT_STATS_SIZE);
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's DSA for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = (stats_count > max_stats) ? max_stats : stats_count;
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the DSA area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that the statistics
+ * are published even if the process exits while a client is waiting.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, or by
+ * the previous execution of this function by this process, attach to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer =
+ dsa_allocate0(area, stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *)
+ dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of all of their children upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, 0, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* Statistics are not aggregated, i.e individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after copying all the statistics
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ hash_destroy(context_id_lookup);
+ dsa_detach(area);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level (from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ char clipped_ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char *name;
+ char *ident;
+ int *path_list;
+
+ Assert(MemoryContextIsValid(context));
+
+ if (context->name != NULL)
+ {
+ Assert(strlen(context->name) < MEMORY_CONTEXT_IDENT_SHMEM_SIZE);
+ memctx_info[curr_id].name = dsa_allocate0(area, strlen(context->name) + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strlcpy(name, context->name, strlen(context->name) + 1);
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (context->ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(context->ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcpy(clipped_ident, context->ident, idlen);
+ clipped_ident[idlen] = '\0';
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (strncmp(context->name, "dynahash", 8) == 0)
+ {
+ dsa_free(area, memctx_info[curr_id].name);
+ memctx_info[curr_id].name = dsa_allocate0(area, idlen + 1);
+ name = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strlcpy(name, clipped_ident, idlen + 1);
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+ }
+ else
+ {
+ memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1);
+ ident = (char *) dsa_get_address(area, memctx_info[curr_id].ident);
+ strlcpy(ident, clipped_ident, idlen + 1);
+ }
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length * sizeof(int));
+ path_list = (int *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_list[foreach_current_index(i)] = i;
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type, true);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+/*
+ * dsa_free_previous_stats
+ *
+ * Worker for freeing resources from a MemoryContextEntry. Callers are
+ * responsible for ensuring that the DSA pointer is valid.
+ */
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ Assert(meminfo != NULL);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in DSA area.
+ */
+void
+AtProcExit_memstats_dsa_free(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ dsm_segment *dsm_seg = NULL;
+ dsa_area *area = NULL;
+
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ return;
+
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg != NULL)
+ return;
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+
+ dsa_detach(area);
+ LWLockRelease(&memCtxState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6b57b7e18d9..ff25f729d94 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8535,6 +8535,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 603d0424354..d3c44df6e13 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 022fd8ed933..5d4b2fbfc9c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..da6b633cbda 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,7 +51,12 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 128
+#define MEM_CONTEXT_MAX_LEVEL 64
+#define MAX_SEGMENTS_PER_BACKEND 1
+#define MEM_CONTEXT_PATH_SIZE MEM_CONTEXT_MAX_LEVEL * sizeof(int)
+#define MAX_MEMORY_CONTEXT_STATS_SIZE sizeof(MemoryContextEntry) + MEM_CONTEXT_PATH_SIZE + 2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE
/*
* Standard top-level memory contexts.
*
@@ -319,4 +327,65 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Static shared memory state representing the DSA area created for memory
+ * context statistics reporting. A single DSA area is created and used by all
+ * the processes, each having its specific DSA allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type, bool missing_ok);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+extern void AtProcExit_memstats_dsa_free(int code, Datum arg);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..dca20ae1a26 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..4767351d4e2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 449bafc123c..f083690c5ce 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1654,12 +1654,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.39.3 (Apple Git-146)
v21-0001-Preparatory-changes-for-reporting-memory-context.patchapplication/octet-stream; name=v21-0001-Preparatory-changes-for-reporting-memory-context.patch; x-unix-mode=0644Download
From 3f3597952fac250eeb55aa1b09a6409c4417beab Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Wed, 26 Mar 2025 14:38:39 +0530
Subject: [PATCH v21 1/2] Preparatory changes for reporting memory context
statistics
Ensure that MemoryContextStatsInternal can return number of
contexts. Also, provide an option in MemoryContextStatsInternal
to return without printing stats to either stderr or logs.
---
src/backend/utils/mmgr/mcxt.c | 77 ++++++++++++++++++++++++++++++-----
1 file changed, 66 insertions(+), 11 deletions(-)
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index d98ae9db6be..2cbde8f39c3 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -135,6 +135,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,7 +173,7 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location, int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +842,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +889,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +904,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
++ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +956,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +975,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -939,7 +994,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
--
2.39.3 (Apple Git-146)
On 2 Apr 2025, at 23:44, Daniel Gustafsson <daniel@yesql.se> wrote:
I think this version is close to a committable state, will spend a little more
time testing, polishing and rewriting the commit message. I will also play
around with placement within the memory context code files to keep it from
making backpatch issues.
After a bit more polish I landed with the attached, which I most likely will go
ahead with after another round in CI.
--
Daniel Gustafsson
Attachments:
v23-0001-Add-function-to-get-memory-context-stats-for-pro.patchapplication/octet-stream; name=v23-0001-Add-function-to-get-memory-context-stats-for-pro.patch; x-unix-mode=0644Download
From 0640d85687f5c9c917aab97b76c336d052ddfad5 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Wed, 26 Mar 2025 14:38:39 +0530
Subject: [PATCH v23] Add function to get memory context stats for processes
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of retries to perform, each with a 500msec sleep.
In the case there no statistics are published within the set
timeout, the last known statistics are returned, or NULL if
no previously published statistics exist. This allows dash-
board type usages to continually publish data even if the
target process is temporarily congested. Context records
contain a timestamp to indicate when they were submitted.
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
---
doc/src/sgml/func.sgml | 95 +++
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 15 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 432 +++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 615 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 80 +++
src/test/regress/expected/sysviews.out | 14 +
src/test/regress/sql/sysviews.sql | 14 +
src/tools/pgindent/typedefs.list | 4 +
22 files changed, 1276 insertions(+), 40 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0224f93733d..b5add4bf99a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28650,6 +28650,68 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>retries</parameter> <type>integer</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ Memory contexts are arranged in a tree-like hierarchy. When
+ <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed. The levels
+ are limited to the first 100 contexts.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>retries</parameter> sets the number of times the client will
+ retry to get updated statistics. The sleep per try is
+ <literal>0.5</literal> seconds. This parameter can be increased if the
+ user anticipates a delay in the response from the reporting process.
+ Conversely, if users are frequently and periodically querying the
+ process for statistics, or if there are concurrent requests for
+ statistics of the same process, lowering the parameter might help
+ achieve a faster response.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> is means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28789,6 +28851,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer')
+ , false, 5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..16756152b71 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd2..cb7408acf4c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f871..c7a76711cc5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..362a6dc9528 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -343,6 +344,8 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemCtxShmemInit();
+ MemCtxBackendShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b7c39a4c5f0..a3c2cd12277 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..41a60a11d2b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -546,6 +547,13 @@ InitProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this
+ * backend if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at backend exit.
*/
@@ -720,6 +728,13 @@ InitAuxiliaryProcess(void)
*/
PGSemaphoreReset(MyProc->sem);
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this
+ * process if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Arrange to clean up at process exit.
*/
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6ae9f38f0c8..dc4c600922d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3535,6 +3535,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8bce14c38fd..ee0e42535b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..375509f2551 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,26 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
+#include "nodes/pg_list.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +141,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type, true);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +156,44 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid. If missing_ok is
+ * false then execution will error out on invalid context types.
+ */
+const char *
+ContextTypeToString(NodeTag type, bool missing_ok)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ if (missing_ok)
+ context_type = "???";
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("invalid memory context type specified"));
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +324,358 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing backend.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_retries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool get_summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextEntry *memctx_info;
+ int num_retries = 0;
+ TimestampTz curr_timestamp;
+ int max_retries = PG_GETARG_INT32(2);
+
+ /*
+ * Only superusers or users with pg_read_all_stats privileges can view the
+ * memory context statistics of another process.
+ */
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("permission denied to extract memory context statistics"));
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].get_summary = get_summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for max_retries *
+ * MEMSTATS_WAIT_TIMEOUT, following which display old statistics if
+ * available or return NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid DSA
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ msecs = TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+#define MEMSTATS_WAIT_TIMEOUT 500
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ /*
+ * Wait for max_retries, as defined by the user. If no updated
+ * statistics are available within the wait time defined by
+ * max_retries then display previously published statistics if
+ * there are any. If no previous statistics are available then
+ * return NULL.
+ */
+ if (num_retries > max_retries)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ ereport(DEBUG1,
+ errmsg("timed out waiting for process with PID %d to publish stats, retrying",
+ pid));
+ num_retries = num_retries + 1;
+ }
+ }
+
+ /*
+ * We should only reach here with a valid DSA handle, either containing
+ * updated statistics or previously published statistics (identified by
+ * the timestamp.
+ */
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memctx_info = (MemoryContextEntry *)
+ dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum = NULL;
+ int *path_int = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_int = (int *) dsa_get_address(area, memctx_info[i].path);
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(path_int[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(path_length); /* level */
+ values[5] = Int64GetDatum(memctx_info[i].totalspace);
+ values[6] = Int64GetDatum(memctx_info[i].nblocks);
+ values[7] = Int64GetDatum(memctx_info[i].freespace);
+ values[8] = Int64GetDatum(memctx_info[i].freechunks);
+ values[9] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[10] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Init shared memory for reporting memory context information.
+ */
+void
+MemCtxBackendShmemInit(void)
+{
+ bool found;
+ Size TotalProcs;
+
+ TotalProcs = add_size(MaxBackends, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, max_prepared_xacts);
+
+ memCtxState = (MemoryContextBackendState *)
+ ShmemInitStruct("MemoryContextBackendState",
+ mul_size(TotalProcs, sizeof(MemoryContextBackendState)),
+ &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ for (int i = 0; i < TotalProcs; i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ {
+ Assert(found);
+ }
+}
+
+/*
+ * Initialize shared memory for displaying memory context statistics
+ */
+void
+MemCtxShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *)
+ ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+
+ LWLockInitialize(&memCtxArea->lw_lock,
+ LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ {
+ Assert(found);
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 2152aad97d9..92304a1f124 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index d98ae9db6be..f29a282cb5f 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "storage/lwlock.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -135,6 +142,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,10 +180,23 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location,
+ int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool get_summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup,
+ int max_level);
+static void dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +862,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +909,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +924,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +976,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +995,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -939,7 +1014,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1351,22 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1404,508 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Statistics written by each process are tracked independently in
+ * per-process DSA pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility maximum size of statistics for each context. The remaining context
+ * statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If get_summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContextEntry *meminfo;
+ bool get_summary = false;
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (MAX_MEMORY_CONTEXT_STATS_SIZE);
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ get_summary = memCtxState[idx].get_summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ get_summary);
+
+ /*
+ * Allocate memory in this process's DSA for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_count = Min(stats_count, max_stats);
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the DSA area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that the statistics
+ * are published even if the process exits while a client is waiting.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, or by
+ * the previous execution of this function by this process, attach to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer =
+ dsa_allocate0(area, stats_count * sizeof(MemoryContextEntry));
+
+ meminfo = (MemoryContextEntry *)
+ dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (get_summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of all of their children upto level 100.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ int level = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, level, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup, 100);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ goto cleanup;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ char *name;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup, 100);
+
+ if (context_id <= (max_stats - 2))
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ */
+ if (context_id == (max_stats - 2) && context_id < (stats_count - 1))
+ {
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, 17);
+ name = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(name, "Remaining Totals", 16);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+ /* Statistics are not aggregated, i.e individual statistics reported */
+ if (context_id < (max_stats - 2))
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+
+cleanup:
+
+ /*
+ * Signal all the waiting client backends after copying all the statistics
+ */
+ memCtxState[idx].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[idx].memctx_cv);
+ hash_destroy(context_id_lookup);
+ dsa_detach(area);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup, int max_level)
+{
+ bool found;
+ List *path = NIL;
+ List *tmp = NIL;
+
+ for (MemoryContext cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ if (list_length(path) > max_level)
+ {
+ tmp = list_copy_head(path, max_level);
+ list_free(path);
+ path = tmp;
+ }
+
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool get_summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = (++(*stats_count));
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (get_summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = (++(*stats_count));
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary only the first two level (from top) contexts are
+ * displayed
+ */
+ if (get_summary)
+ break;
+ }
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryContextEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+ int *path_list;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts
+ * with just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+ char *nameptr;
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1);
+ nameptr = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strlcpy(nameptr, name, namelen + 1);
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+ char *identptr;
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1);
+ identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident);
+ strlcpy(identptr, ident, idlen + 1);
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ memctx_info[curr_id].path_length = list_length(path);
+ memctx_info[curr_id].path = dsa_allocate0(area,
+ memctx_info[curr_id].path_length * sizeof(int));
+ path_list = (int *) dsa_get_address(area, memctx_info[curr_id].path);
+ foreach_int(i, path)
+ path_list[foreach_current_index(i)] = i;
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type, true);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+/*
+ * dsa_free_previous_stats
+ *
+ * Worker for freeing resources from a MemoryContextEntry. Callers are
+ * responsible for ensuring that the DSA pointer is valid.
+ */
+static void
+dsa_free_previous_stats(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextEntry *meminfo;
+
+ meminfo = (MemoryContextEntry *) dsa_get_address(area, prev_dsa_pointer);
+ Assert(meminfo != NULL);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in DSA area.
+ */
+void
+AtProcExit_memstats_dsa_free(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ dsm_segment *dsm_seg = NULL;
+ dsa_area *area = NULL;
+
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ return;
+
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg != NULL)
+ return;
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ dsa_free_previous_stats(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+
+ dsa_detach(area);
+ LWLockRelease(&memCtxState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5d5be8ba4e1..1f7a2c7ce41 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8559,6 +8559,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool int4',
+ proallargtypes => '{int4,bool,int4,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0d8528b2875..58b2496a9cb 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 016dfd9b3f6..cfe14631445 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..33ee288736d 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,6 +51,22 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in Mb) of DSA area per process */
+#define MAX_SEGMENTS_PER_BACKEND 1
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE sizeof(MemoryContextEntry) + \
+ (100 * sizeof(int)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
/*
* Standard top-level memory contexts.
@@ -319,4 +338,65 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextEntry;
+
+/*
+ * Static shared memory state representing the DSA area created for memory
+ * context statistics reporting. A single DSA area is created and used by all
+ * the processes, each having its specific DSA allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool get_summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type, bool missing_ok);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemCtxShmemInit(void);
+extern void MemCtxBackendShmemInit(void);
+extern void AtProcExit_memstats_dsa_free(int code, Datum arg);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..dca20ae1a26 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,17 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..4767351d4e2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,17 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c81d03950d..864a0d2b56b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1662,12 +1662,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextEntry
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.39.3 (Apple Git-146)
Hi Daniel,
After a bit more polish I landed with the attached, which I most likely
will go
ahead with after another round in CI.
Thank you for refining the code. The changes look good to me.
Regression tests ran smoothly in parallel with the memory monitoring
function,
pgbench results with the following custom script also shows good
performance.
```
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
ORDER BY random() LIMIT 1)
, false, 5);
```
Thank you,
Rahila Syed
Following up on some off-list comments, attached is a v26 with a few small last
changes:
* Improved documentation (docs and comments)
* Fixed up Shmem sizing and init
* Delayed registering to the shmem cleanup to get it earlier in cleanup
* Renamed a few datastructures to improve readability
* Various bits of polish
I think this function can be a valuable debugging aid going forward.
--
Daniel Gustafsson
Attachments:
v26-0001-Add-function-to-get-memory-context-stats-for-pro.patchapplication/octet-stream; name=v26-0001-Add-function-to-get-memory-context-stats-for-pro.patch; x-unix-mode=0644Download
From 3fb017a35bf94cda6a95c6a6dfd074f289a2714f Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 7 Apr 2025 03:06:47 +0200
Subject: [PATCH v26] Add function to get memory context stats for processes
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, the last known statistics are returned, or NULL if
no previously published statistics exist. This allows dash-
board type usages to continually publish data even if the
target process is temporarily congested. Context records
contain a timestamp to indicate when they were submitted.
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
---
doc/src/sgml/func.sgml | 171 +++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/auxprocess.c | 7 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/backend_startup.c | 7 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 406 ++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/mcxt.c | 636 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 81 +++
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 4 +
25 files changed, 1362 insertions(+), 40 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0224f93733d..347f45a417d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28650,6 +28650,144 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>stats_timestamp</parameter> - When the statistics were
+ extracted from the process.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed. The levels
+ are limited to the first 100 contexts.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> is means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28789,6 +28927,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 273008db37f..1166e99a000 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -666,6 +666,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..16756152b71 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 4f6795f7265..d3b4df27935 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void)
/* register a before-shutdown callback for LWLock cleanup */
before_shmem_exit(ShutdownAuxiliaryProcess, 0);
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this aux
+ * proc if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
SetProcessingMode(NormalProcessing);
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd2..cb7408acf4c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f871..c7a76711cc5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..00c76d05356 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextReportingShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextReportingShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b7c39a4c5f0..a3c2cd12277 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/backend_startup.c b/src/backend/tcop/backend_startup.c
index dde8d5b3517..5701e52f3a5 100644
--- a/src/backend/tcop/backend_startup.c
+++ b/src/backend/tcop/backend_startup.c
@@ -115,6 +115,13 @@ BackendMain(const void *startup_data, size_t startup_data_len)
*/
InitProcess();
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this
+ * backend if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
+
/*
* Make sure we aren't in PostmasterContext anymore. (We can't delete it
* just yet, though, because InitPostgres will need the HBA data.)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6ae9f38f0c8..dc4c600922d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3535,6 +3535,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8bce14c38fd..ee0e42535b6 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..7d3d4b3b1c6 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,24 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextBackendState *memCtxState = NULL;
+struct MemoryContextState *memCtxArea = NULL;
/*
* int_list_to_array
@@ -143,24 +139,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +154,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return (context_type);
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +316,340 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing backend.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
+ * time left within the timeout specified by the user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ double timer = 0;
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ dsa_area *area;
+ MemoryContextStatsEntry *memctx_info;
+ TimestampTz curr_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCtxState[procNumber].summary = summary;
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ curr_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * A valid DSA pointer isn't proof that statistics are available, it can
+ * be valid due to previously published stats. Check if the stats are
+ * updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for the timeout specified by the
+ * user, following which display old statistics if available or return
+ * NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid DSA
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCtxState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ msecs = TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
+
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)
+ && msecs >= 0)
+ break;
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv,
+ MEMSTATS_WAIT_TIMEOUT,
+ WAIT_EVENT_MEM_CTX_PUBLISH))
+ {
+ timer += MEMSTATS_WAIT_TIMEOUT;
+
+ /*
+ * Wait for the timeout as defined by the user. If no updated
+ * statistics are available within the allowed time then display
+ * previously published statistics if there are any. If no
+ * previous statistics are available then return NULL. The timer
+ * is defined in milliseconds since thats what the condition
+ * variable sleep uses.
+ */
+ if ((timer * 1000) >= timeout)
+ {
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ }
+ }
+
+ /*
+ * We should only reach here with a valid DSA handle, either containing
+ * updated statistics or previously published statistics (identified by
+ * the timestamp.
+ */
+ Assert(memCtxArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memctx_info = (MemoryContextStatsEntry *)
+ dsa_get_address(area, memCtxState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCtxState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum = NULL;
+ int *path_int = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memctx_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memctx_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+
+ if (DsaPointerIsValid(memctx_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memctx_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ if (memctx_info[i].type != NULL)
+ values[2] = CStringGetTextDatum(memctx_info[i].type);
+ else
+ nulls[2] = true;
+
+ path_length = memctx_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (DsaPointerIsValid(memctx_info[i].path))
+ {
+ path_int = (int *) dsa_get_address(area, memctx_info[i].path);
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(path_int[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memctx_info[i].levels);
+ values[5] = Int64GetDatum(memctx_info[i].totalspace);
+ values[6] = Int64GetDatum(memctx_info[i].nblocks);
+ values[7] = Int64GetDatum(memctx_info[i].freespace);
+ values[8] = Int64GetDatum(memctx_info[i].freechunks);
+ values[9] = Int64GetDatum(memctx_info[i].totalspace -
+ memctx_info[i].freespace);
+ values[10] = Int32GetDatum(memctx_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCtxState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCtxState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+ dsa_detach(area);
+
+ PG_RETURN_NULL();
+}
+
+Size
+MemoryContextReportingShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+
+ sz = add_size(sz, sizeof(MemoryContextState));
+ sz = mul_size(sz, sizeof(MemoryContextBackendState));
+
+ return sz;
+}
+
+/*
+ * Initialize shared memory for displaying memory context statistics
+ */
+void
+MemoryContextReportingShmemInit(void)
+{
+ bool found;
+
+ memCtxArea = (MemoryContextState *)
+ ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+ LWLockInitialize(&memCtxArea->lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxArea->lw_lock.tranche,
+ "mem_context_stats_reporting");
+ memCtxArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+ else
+ Assert(found);
+
+ memCtxState = (MemoryContextBackendState *)
+ ShmemInitStruct("MemoryContextBackendState",
+ ((MaxBackends + NUM_AUXILIARY_PROCS) * sizeof(MemoryContextBackendState)),
+ &found);
+
+ if (!IsUnderPostmaster)
+ {
+ Assert(!found);
+ for (int i = 0; i < (MaxBackends + NUM_AUXILIARY_PROCS); i++)
+ {
+ ConditionVariableInit(&memCtxState[i].memctx_cv);
+ LWLockInitialize(&memCtxState[i].lw_lock, LWLockNewTrancheId());
+ LWLockRegisterTranche(memCtxState[i].lw_lock.tranche,
+ "mem_context_backend_stats_reporting");
+ memCtxState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ }
+ else
+ Assert(found);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 2152aad97d9..92304a1f124 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index d98ae9db6be..b3ed304d161 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,11 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/lwlock.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -135,6 +140,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -162,10 +178,24 @@ static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location,
+ int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextStatsEntry *memctx_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area,
+ int max_levels);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer);
+static void signal_memorycontext_reporting(void);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +861,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +908,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +923,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +975,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +994,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -939,7 +1013,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1350,22 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1403,530 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Statistics written by each process are tracked independently in
+ * per-process DSA pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility maximum size of statistics for each context. The remaining context
+ * statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContextStatsEntry *meminfo;
+ bool summary = false;
+ dsa_area *area = NULL;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ int stats_num = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
+ / (MAX_MEMORY_CONTEXT_STATS_SIZE);
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ summary = memCtxState[idx].summary;
+ LWLockRelease(&memCtxState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ summary);
+
+ /*
+ * Allocate memory in this process's DSA for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_num = Min(stats_count, max_stats);
+
+ LWLockAcquire(&memCtxArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCtxArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the DSA area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that the statistics
+ * are published even if the process exits while a client is waiting.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCtxArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, or by
+ * the previous execution of this function by this process, attach to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCtxArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+ memCtxState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ free_memorycontextstate_dsa(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+ memCtxState[idx].memstats_dsa_pointer =
+ dsa_allocate0(area, stats_num * sizeof(MemoryContextStatsEntry));
+
+ meminfo = (MemoryContextStatsEntry *)
+ dsa_get_address(area, memCtxState[idx].memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int ctx_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, ctx_id, TopMemoryContext, path, stat,
+ 1, area, 100);
+ ctx_id = ctx_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ int level = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, level, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, ctx_id, c, path,
+ grand_totals, num_contexts, area, 100);
+ ctx_id = ctx_id + 1;
+ }
+ memCtxState[idx].total_stats = ctx_id;
+ /* Notify waiting backends and return */
+ hash_destroy(context_id_lookup);
+ dsa_detach(area);
+ signal_memorycontext_reporting();
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (max_stats - 1) || stats_count <= max_stats)
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100);
+ }
+ else
+ {
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When stats_count is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (stats_count > max_stats && context_id == (max_stats - 2))
+ {
+ char *nameptr;
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate0(area, namelen + 1);
+ nameptr = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(nameptr, "Remaining Totals", namelen);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = NULL;
+ }
+ context_id++;
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * stats_count <= max_stats.
+ */
+ if (stats_count <= max_stats)
+ {
+ memCtxState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCtxState[idx].total_stats = num_individual_stats + 1;
+ }
+
+ /* Notify waiting backends and return */
+ hash_destroy(context_id_lookup);
+ dsa_detach(area);
+ signal_memorycontext_reporting();
+}
+
+/*
+ * Signal all the waiting client backends after copying all the statistics.
+ */
+static void
+signal_memorycontext_reporting(void)
+{
+ memCtxState[MyProcNumber].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCtxState[MyProcNumber].lw_lock);
+ ConditionVariableBroadcast(&memCtxState[MyProcNumber].memctx_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextId *entry;
+ bool found;
+
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = ++(*stats_count);
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (summary)
+ {
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = ++(*stats_count);
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary mode only the first two level (from top) contexts are
+ * displayed.
+ */
+ if (summary)
+ break;
+ }
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryContextStatsEntry *memctx_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+ int *path_list;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+ char *nameptr;
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1);
+ nameptr = (char *) dsa_get_address(area, memctx_info[curr_id].name);
+ strlcpy(nameptr, name, namelen + 1);
+ }
+ else
+ memctx_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+ char *identptr;
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1);
+ identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident);
+ strlcpy(identptr, ident, idlen + 1);
+ }
+ else
+ memctx_info[curr_id].ident = InvalidDsaPointer;
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memctx_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memctx_info[curr_id].path_length = levels;
+ memctx_info[curr_id].path = dsa_allocate0(area, levels * sizeof(int));
+ memctx_info[curr_id].levels = list_length(path);
+ path_list = (int *) dsa_get_address(area, memctx_info[curr_id].path);
+
+ foreach_int(i, path)
+ {
+ path_list[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memctx_info[curr_id].type = ContextTypeToString(context->type);
+ memctx_info[curr_id].totalspace = stat.totalspace;
+ memctx_info[curr_id].nblocks = stat.nblocks;
+ memctx_info[curr_id].freespace = stat.freespace;
+ memctx_info[curr_id].freechunks = stat.freechunks;
+ memctx_info[curr_id].num_agg_stats = num_contexts;
+}
+
+/*
+ * free_memorycontextstate_dsa
+ *
+ * Worker for freeing resources from a MemoryContextStatsEntry. Callers are
+ * responsible for ensuring that the DSA pointer is valid.
+ */
+static void
+free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextStatsEntry *meminfo;
+
+ meminfo = (MemoryContextStatsEntry *) dsa_get_address(area, prev_dsa_pointer);
+ Assert(meminfo != NULL);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in DSA area.
+ */
+void
+AtProcExit_memstats_dsa_free(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ dsm_segment *dsm_seg = NULL;
+ dsa_area *area = NULL;
+
+ if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ return;
+
+ dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle);
+
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCtxState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg != NULL)
+ return;
+ area = dsa_attach(memCtxArea->memstats_dsa_handle);
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ free_memorycontextstate_dsa(area, memCtxState[idx].total_stats,
+ memCtxState[idx].memstats_dsa_pointer);
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
+ memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
+
+ dsa_detach(area);
+ LWLockRelease(&memCtxState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5d5be8ba4e1..90675be66f6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8559,6 +8559,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0d8528b2875..58b2496a9cb 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 016dfd9b3f6..cfe14631445 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..bfeb1575276 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,6 +51,22 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in Mb) of DSA area per process */
+#define MAX_SEGMENTS_PER_BACKEND 1
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE sizeof(MemoryContextStatsEntry) + \
+ (100 * sizeof(int)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
/*
* Standard top-level memory contexts.
@@ -319,4 +338,66 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextStatsEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ const char *type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextStatsEntry;
+
+/*
+ * Static shared memory state representing the DSA area created for memory
+ * context statistics reporting. A single DSA area is created and used by all
+ * the processes, each having its specific DSA allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextState;
+
+/*
+ * Per backend static shared memory state for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryContextBackendState
+{
+ ConditionVariable memctx_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextBackendState;
+
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextId;
+
+extern PGDLLIMPORT MemoryContextBackendState *memCtxState;
+extern PGDLLIMPORT MemoryContextState *memCtxArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern Size MemoryContextReportingShmemSize(void);
+extern void MemoryContextReportingShmemInit(void);
+extern void AtProcExit_memstats_dsa_free(int code, Datum arg);
+
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..343fc8ca2a1 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..dd92b520070 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ launcher_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='autovacuum launcher'
+ INTO launcher_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(launcher_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d42b943ef94..c2fcc47a803 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1662,12 +1662,16 @@ MemoizeState
MemoizeTuple
MemoryChunk
MemoryContext
+MemoryContextBackendState
MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryContextState
+MemoryContextStatsEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.39.3 (Apple Git-146)
Hi,
On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote:
I think this function can be a valuable debugging aid going forward.
What I am most excited about for this is to be able to measure server-wide and
fleet-wide memory usage over time. Today I have actually very little idea
about what memory is being used for across all connections, not to speak of a
larger number of servers.
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c index 4f6795f7265..d3b4df27935 100644 --- a/src/backend/postmaster/auxprocess.c +++ b/src/backend/postmaster/auxprocess.c @@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void) /* register a before-shutdown callback for LWLock cleanup */ before_shmem_exit(ShutdownAuxiliaryProcess, 0);+ /* + * The before shmem exit callback frees the DSA memory occupied by the + * latest memory context statistics that could be published by this aux + * proc if requested. + */ + before_shmem_exit(AtProcExit_memstats_dsa_free, 0); + SetProcessingMode(NormalProcessing); }
How about putting it into BaseInit()? Or maybe just register it when its
first used?
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index fda91ffd1ce..d3cb3f1891c 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void) /* Perform logging of memory contexts of this process */ if (LogMemoryContextPending) ProcessLogMemoryContextInterrupt(); + + /* Publish memory contexts of this process */ + if (PublishMemoryContextPending) + ProcessGetMemoryContextInterrupt(); }/*
Not this patch's responsibility, but we really desperately need to unify our
interrupt handling. Manually keeping a ~dozen of functions similar, but not
exactly the same, is an insane approach.
--- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit." WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication." WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated." XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end." +MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
The memory context stuff abbreviates as cxt not ctx. There's a few more cases
of that in the patch.
+const char * +ContextTypeToString(NodeTag type) +{ + const char *context_type; + + switch (type) + { + case T_AllocSetContext: + context_type = "AllocSet"; + break; + case T_GenerationContext: + context_type = "Generation"; + break; + case T_SlabContext: + context_type = "Slab"; + break; + case T_BumpContext: + context_type = "Bump"; + break; + default: + context_type = "???"; + break; + } + return (context_type);
Why these parens?
+ * If the publishing backend does not respond before the condition variable + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is + * time left within the timeout specified by the user, before giving up and + * returning previously published statistics, if any. If no previous statistics + * exist, return NULL.
Why do we need to repeatedly wake up rather than just sleeping with the
"remaining" amount of time based on the time the function was called and the
time that has passed since?
+ /* + * A valid DSA pointer isn't proof that statistics are available, it can + * be valid due to previously published stats.
Somehow "valid DSA pointer" is a bit too much about the precise mechanics and
not enough about what's actually happening. I'd rather say something like
"Even if the proc has published statistics, they may not be due to the current
request, but previously published stats."
+ if (ConditionVariableTimedSleep(&memCtxState[procNumber].memctx_cv, + MEMSTATS_WAIT_TIMEOUT, + WAIT_EVENT_MEM_CTX_PUBLISH)) + { + timer += MEMSTATS_WAIT_TIMEOUT; + + /* + * Wait for the timeout as defined by the user. If no updated + * statistics are available within the allowed time then display + * previously published statistics if there are any. If no + * previous statistics are available then return NULL. The timer + * is defined in milliseconds since thats what the condition + * variable sleep uses. + */ + if ((timer * 1000) >= timeout) + {
I'd suggest just comparing how much time has elapsed since the timestamp
you've requested earlier.
+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE); + /* Displaying previously published statistics if available */ + if (DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer)) + break; + else + { + LWLockRelease(&memCtxState[procNumber].lw_lock); + PG_RETURN_NULL(); + } + } + } + }
+/* + * Initialize shared memory for displaying memory context statistics + */ +void +MemoryContextReportingShmemInit(void) +{ + bool found; + + memCtxArea = (MemoryContextState *) + ShmemInitStruct("MemoryContextState", sizeof(MemoryContextState), &found); + + if (!IsUnderPostmaster) + { + Assert(!found);
I don't really understand why this uses IsUnderPostmaster? Seems like this
should just use found like most (or all) the other *ShmemInit() functions do?
+ LWLockInitialize(&memCtxArea->lw_lock, LWLockNewTrancheId());
I think for builtin code we just hardcode the tranches in BuiltinTrancheIds.
+ memCtxState = (MemoryContextBackendState *) + ShmemInitStruct("MemoryContextBackendState", + ((MaxBackends + NUM_AUXILIARY_PROCS) * sizeof(MemoryContextBackendState)), + &found);
FWIW, I think it'd be mildly better if these two ShmemInitStruct()'s were
combined.
static void MemoryContextStatsInternal(MemoryContext context, int level, int max_level, int max_children, MemoryContextCounters *totals, - bool print_to_stderr) + PrintDestination print_location, int *num_contexts) { MemoryContext child; int ichild; @@ -884,10 +923,39 @@ MemoryContextStatsInternal(MemoryContext context, int level, Assert(MemoryContextIsValid(context));/* Examine the context itself */ - context->methods->stats(context, - MemoryContextStatsPrint, - &level, - totals, print_to_stderr); + switch (print_location) + { + case PRINT_STATS_TO_STDERR: + context->methods->stats(context, + MemoryContextStatsPrint, + &level, + totals, true); + break; + + case PRINT_STATS_TO_LOGS: + context->methods->stats(context, + MemoryContextStatsPrint, + &level, + totals, false); + break; + + case PRINT_STATS_NONE: + + /* + * Do not print the statistics if print_location is + * PRINT_STATS_NONE, only compute totals. This is used in + * reporting of memory context statistics via a sql function. Last + * parameter is not relevant. + */ + context->methods->stats(context, + NULL, + NULL, + totals, false); + break; + } + + /* Increment the context count for each of the recursive call */ + *num_contexts = *num_contexts + 1;
It feels a bit silly to duplicate the call to context->methods->stats three
times. We've changed these parameters a bunch in the past, having more callers
to fix makes that more work. Can't the switch just set up the args that are
then passed to one call to context->methods->stats?
+ + /* Compute the number of stats that can fit in the defined limit */ + max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE) + / (MAX_MEMORY_CONTEXT_STATS_SIZE);
MAX_SEGMENTS_PER_BACKEND sounds way too generic to me for something defined in
memutils.h. I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
something that makes sense to use here?
The header says:
+/* Maximum size (in Mb) of DSA area per process */ +#define MAX_SEGMENTS_PER_BACKEND 1
But the name doesn't at all indicate it's in megabytes. Nor does the way it's
used clearly indicate that. That seems to be completely incidental based on
the current default value DSA_DEFAULT_INIT_SEGMENT_SIZE.
+ /* + * Hold the process lock to protect writes to process specific memory. Two + * processes publishing statistics do not block each other. + */
s/specific/process specific/
+ LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE); + memCtxState[idx].proc_id = MyProcPid; + + if (DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer)) + { + /* + * Free any previous allocations, free the name, ident and path + * pointers before freeing the pointer that contains them. + */ + free_memorycontextstate_dsa(area, memCtxState[idx].total_stats, + memCtxState[idx].memstats_dsa_pointer); + + dsa_free(area, memCtxState[idx].memstats_dsa_pointer); + memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
Both callers to free_memorycontextstate_dsa() do these lines immediately after
calling free_memorycontextstate_dsa(), why not do that inside?
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL; + c = c->nextchild) + { + MemoryContextCounters grand_totals; + int num_contexts = 0; + int level = 0; + + path = NIL; + memset(&grand_totals, 0, sizeof(grand_totals)); + + MemoryContextStatsInternal(c, level, 100, 100, &grand_totals, + PRINT_STATS_NONE, &num_contexts); + + path = compute_context_path(c, context_id_lookup); + + PublishMemoryContext(meminfo, ctx_id, c, path, + grand_totals, num_contexts, area, 100); + ctx_id = ctx_id + 1; + } + memCtxState[idx].total_stats = ctx_id; + /* Notify waiting backends and return */ + hash_destroy(context_id_lookup); + dsa_detach(area); + signal_memorycontext_reporting(); + } + + foreach_ptr(MemoryContextData, cur, contexts) + { + List *path = NIL; + + /* + * Figure out the transient context_id of this context and each of its + * ancestors, to compute a path for this context. + */ + path = compute_context_path(cur, context_id_lookup); + + /* Account for saving one statistics slot for cumulative reporting */ + if (context_id < (max_stats - 1) || stats_count <= max_stats) + { + /* Examine the context stats */ + memset(&stat, 0, sizeof(stat)); + (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
Hm. So here we call the callback ourselves, even though we extended
MemoryContextStatsInternal() to satisfy the summary output. I guess it's
tolerable, but it's not great.
+ /* Copy statistics to DSA memory */ + PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100); + } + else + { + /* Examine the context stats */ + memset(&stat, 0, sizeof(stat)); + (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
But do we really do it twice in a row? The lines are exactly the same, so it
seems that should just be done before the if?
+ + /* Notify waiting backends and return */ + hash_destroy(context_id_lookup); + dsa_detach(area); + signal_memorycontext_reporting(); +} + +/* + * Signal all the waiting client backends after copying all the statistics. + */ +static void +signal_memorycontext_reporting(void) +{ + memCtxState[MyProcNumber].stats_timestamp = GetCurrentTimestamp(); + LWLockRelease(&memCtxState[MyProcNumber].lw_lock); + ConditionVariableBroadcast(&memCtxState[MyProcNumber].memctx_cv); +}
IMO somewhat confusing to release the lock in a function named
signal_memorycontext_reporting(). Why do we do that after
hash_destroy()/dsa_detach()?
+static void +compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup, + int *stats_count, bool summary) +{ + foreach_ptr(MemoryContextData, cur, contexts) + { + MemoryContextId *entry; + bool found; + + entry = (MemoryContextId *) hash_search(context_id_lookup, &cur, + HASH_ENTER, &found); + Assert(!found); + + /* context id starts with 1 */ + entry->context_id = ++(*stats_count);
Given that we don't actually do anything here relating to starting with 1, I
find that comment confusing.
+static void +PublishMemoryContext(MemoryContextStatsEntry *memctx_info, int curr_id, + MemoryContext context, List *path, + MemoryContextCounters stat, int num_contexts, + dsa_area *area, int max_levels) +{ + const char *ident = context->ident; + const char *name = context->name; + int *path_list; + + /* + * To be consistent with logging output, we label dynahash contexts with + * just the hash table name as with MemoryContextStatsPrint(). + */ + if (context->ident && strncmp(context->name, "dynahash", 8) == 0) + { + name = context->ident; + ident = NULL; + } + + if (name != NULL) + { + int namelen = strlen(name); + char *nameptr; + + if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE) + namelen = pg_mbcliplen(name, namelen, + MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1); + + memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1);
Given the number of references to memctx_info[curr_id] I'd put it in a local variable.
Why is this a dsa_allocate0 given that we're immediately overwriting it?
+ nameptr = (char *) dsa_get_address(area, memctx_info[curr_id].name); + strlcpy(nameptr, name, namelen + 1); + } + else + memctx_info[curr_id].name = InvalidDsaPointer; + + /* Trim and copy the identifier if it is not set to NULL */ + if (ident != NULL) + { + int idlen = strlen(context->ident); + char *identptr; + + /* + * Some identifiers such as SQL query string can be very long, + * truncate oversize identifiers. + */ + if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE) + idlen = pg_mbcliplen(ident, idlen, + MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1); + + memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1); + identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident); + strlcpy(identptr, ident, idlen + 1);
Hm. First I thought we'd leak memory if this second (and subsequent)
dsa_allocate failed. Then I thought we'd be ok, because the memory would be
memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.
But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
only set *after* we would have failed.
+ /* Allocate DSA memory for storing path information */ + if (path == NIL) + memctx_info[curr_id].path = InvalidDsaPointer; + else + { + int levels = Min(list_length(path), max_levels); + + memctx_info[curr_id].path_length = levels; + memctx_info[curr_id].path = dsa_allocate0(area, levels * sizeof(int)); + memctx_info[curr_id].levels = list_length(path); + path_list = (int *) dsa_get_address(area, memctx_info[curr_id].path); + + foreach_int(i, path) + { + path_list[foreach_current_index(i)] = i; + if (--levels == 0) + break; + } + } + memctx_info[curr_id].type = ContextTypeToString(context->type);
I don't think this works across platforms. On windows / EXEC_BACKEND builds
the location of string constants can differ across backends. And: Why do we
need the string here? You can just call ContextTypeToString when reading?
+/* + * Free the memory context statistics stored by this process + * in DSA area. + */ +void +AtProcExit_memstats_dsa_free(int code, Datum arg) +{
FWIW, to me the fact that it does a dsa_free() is an implementation
detail. It's also not the only thing this does.
And, I don't think AtProcExit* really is accurate, given that it runs *before*
shmem is cleaned up?
I wonder if the best approach here wouldn't be to forgo the use of a
before_shmem_exit() callback, but instead use on_dsm_detach(). That would
require we'd not constantly detach from the dsm segment, but I don't
understand why we do that in the first place?
+ int idx = MyProcNumber; + dsm_segment *dsm_seg = NULL; + dsa_area *area = NULL; + + if (memCtxArea->memstats_dsa_handle == DSA_HANDLE_INVALID) + return; + + dsm_seg = dsm_find_mapping(memCtxArea->memstats_dsa_handle); + + LWLockAcquire(&memCtxState[idx].lw_lock, LW_EXCLUSIVE); + + if (!DsaPointerIsValid(memCtxState[idx].memstats_dsa_pointer)) + { + LWLockRelease(&memCtxState[idx].lw_lock); + return; + } + + /* If the dsm mapping could not be found, attach to the area */ + if (dsm_seg != NULL) + return;
I don't understand what we do here with the dsm? Why do we not need cleanup
if we are already attached to the dsm segment?
+/* + * Static shared memory state representing the DSA area created for memory + * context statistics reporting. A single DSA area is created and used by all + * the processes, each having its specific DSA allocations for sharing memory + * statistics, tracked by per backend static shared memory state. + */ +typedef struct MemoryContextState +{ + dsa_handle memstats_dsa_handle; + LWLock lw_lock; +} MemoryContextState;
IMO that's too generic a name for something in a header.
+/* + * Used for storage of transient identifiers for pg_get_backend_memory_contexts + */ +typedef struct MemoryContextId +{ + MemoryContext context; + int context_id; +} MemoryContextId;
This too. Particularly because MemoryContextData->ident exist but is
something different.
+DO $$ +DECLARE + launcher_pid int; + r RECORD; +BEGIN + SELECT pid from pg_stat_activity where backend_type='autovacuum launcher' + INTO launcher_pid; + + select type, name, ident + from pg_get_process_memory_contexts(launcher_pid, false, 20) + where path = '{1}' into r; + RAISE NOTICE '%', r; + select type, name, ident + from pg_get_process_memory_contexts(pg_backend_pid(), false, 20) + where path = '{1}' into r; + RAISE NOTICE '%', r; +END $$;
I'd also test an aux process. I think the AV launcher isn't one, because it
actually does "table" access of shared relations.
Greetings,
Andres Freund
Hi,
Please see some responses below.
On Mon, Apr 7, 2025 at 9:13 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote:
I think this function can be a valuable debugging aid going forward.
What I am most excited about for this is to be able to measure server-wide
and
fleet-wide memory usage over time. Today I have actually very little idea
about what memory is being used for across all connections, not to speak
of a
larger number of servers.diff --git a/src/backend/postmaster/auxprocess.cb/src/backend/postmaster/auxprocess.c
index 4f6795f7265..d3b4df27935 100644 --- a/src/backend/postmaster/auxprocess.c +++ b/src/backend/postmaster/auxprocess.c @@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void) /* register a before-shutdown callback for LWLock cleanup */ before_shmem_exit(ShutdownAuxiliaryProcess, 0);+ /* + * The before shmem exit callback frees the DSA memory occupied bythe
+ * latest memory context statistics that could be published by
this aux
+ * proc if requested. + */ + before_shmem_exit(AtProcExit_memstats_dsa_free, 0); + SetProcessingMode(NormalProcessing); }How about putting it into BaseInit()? Or maybe just register it when its
first used?
Problem with registering it when dsa is first used is that dsa is used in
an interrupt handler.
The handler could be called from the PG_ENSURE_ERROR_CLEANUP block. This
block
operates under the assumption that the before_shmem_exit callback
registered at the beginning,
will be the last one in the registered callback list at the end of the
block. However, this won't be
the case if a callback is registered from an interrupt handler called in
the
PG_ENSURE_ERROR_CLEANUP block.
I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
something that makes sense to use here?
To determine the memory limit per backend in multiples of
DSA_DEFAULT_INIT_SEGMENT_SIZE.
Currently it is set to 1 * DSA_DEFAULT_INIT_SEGMENT_SIZE.
Since a call to dsa_create would create a DSA segment of this size, I
thought it makes sense
to define a limit related to the segment size.
+/*
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg != NULL)
+ return;I don't understand what we do here with the dsm? Why do we not need
cleanup
if we are already attached to the dsm segment?
I am not expecting to hit this case, since we are always detaching from the
dsa.
This could be an assert but since it is a cleanup code, I thought returning
would be
a harmless step.
Thank you,
Rahila Syed
Hi,
On 2025-04-07 21:57:57 +0530, Rahila Syed wrote:
diff --git a/src/backend/postmaster/auxprocess.cb/src/backend/postmaster/auxprocess.c
index 4f6795f7265..d3b4df27935 100644 --- a/src/backend/postmaster/auxprocess.c +++ b/src/backend/postmaster/auxprocess.c @@ -84,6 +84,13 @@ AuxiliaryProcessMainCommon(void) /* register a before-shutdown callback for LWLock cleanup */ before_shmem_exit(ShutdownAuxiliaryProcess, 0);+ /* + * The before shmem exit callback frees the DSA memory occupied bythe
+ * latest memory context statistics that could be published by
this aux
+ * proc if requested. + */ + before_shmem_exit(AtProcExit_memstats_dsa_free, 0); + SetProcessingMode(NormalProcessing); }How about putting it into BaseInit()? Or maybe just register it when its
first used?Problem with registering it when dsa is first used is that dsa is used in an
interrupt handler. The handler could be called from the
PG_ENSURE_ERROR_CLEANUP block. This block operates under the assumption that
the before_shmem_exit callback registered at the beginning, will be the last
one in the registered callback list at the end of the block. However, this
won't be the case if a callback is registered from an interrupt handler
called in the PG_ENSURE_ERROR_CLEANUP block.
Ugh, I really dislike PG_ENSURE_ERROR_CLEANUP().
That's not an argument against moving it to BaseInit() though, as that's
called before procsignal is even initialized and before signals are unmasked.
I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
something that makes sense to use here?
To determine the memory limit per backend in multiples of
DSA_DEFAULT_INIT_SEGMENT_SIZE.
Currently it is set to 1 * DSA_DEFAULT_INIT_SEGMENT_SIZE.
Since a call to dsa_create would create a DSA segment of this size, I
thought it makes sense
to define a limit related to the segment size.
I strongly disagree. The limit should be in an understandable unit, not on
another subystems's defaults that might change at some point.
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg != NULL)
+ return;I don't understand what we do here with the dsm? Why do we not need
cleanup
if we are already attached to the dsm segment?I am not expecting to hit this case, since we are always detaching from the
dsa.
Pretty sure it's reachable, consider a failure of dsa_allocate(). That'll
throw an error, while attached to the segment.
This could be an assert but since it is a cleanup code, I thought returning
would be a harmless step.
The problem is that the code seems wrong - if we are already attached we'll
leak the memory!
As I also mentioned, I don't understand why we're constantly
attaching/detaching from the dsa/dsm either. It just seems to make things more
complicated an dmore expensive.
Greetings,
Andres Freund
That's not an argument against moving it to BaseInit() though, as that's
called before procsignal is even initialized and before signals are
unmasked.
Yes, OK.
I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
something that makes sense to use here?
To determine the memory limit per backend in multiples of
DSA_DEFAULT_INIT_SEGMENT_SIZE.
Currently it is set to 1 * DSA_DEFAULT_INIT_SEGMENT_SIZE.
Since a call to dsa_create would create a DSA segment of this size, I
thought it makes sense
to define a limit related to the segment size.I strongly disagree. The limit should be in an understandable unit, not on
another subystems's defaults that might change at some point.
OK, makes sense.
+ /* If the dsm mapping could not be found, attach to the area */
+ if (dsm_seg != NULL)
+ return;I don't understand what we do here with the dsm? Why do we not need
cleanup
if we are already attached to the dsm segment?I am not expecting to hit this case, since we are always detaching from
the
dsa.
Pretty sure it's reachable, consider a failure of dsa_allocate(). That'll
throw an error, while attached to the segment.
You are right, I did not think of this scenario.
This could be an assert but since it is a cleanup code, I thought
returning
would be a harmless step.
The problem is that the code seems wrong - if we are already attached we'll
leak the memory!
I understand your concern. One issue I recall is that we do not have a
dsa_find_mapping
function similar to dsm_find_mapping(). If I understand correctly, the only
way to access
an already attached DSA is to ensure we store the DSA area mapping in a
global variable.
I'm considering using a global variable and accessing it from the cleanup
function in case
it is already mapped.
Does that sound fine?
As I also mentioned, I don't understand why we're constantly
attaching/detaching from the dsa/dsm either. It just seems to make things
more
complicated an dmore expensive.
OK, I see that this could be expensive if a process is periodically being
queried for
statistics. However, in scenarios where a process is queried only once for
memory,
statistics, keeping the area mapped would consume memory resources, correct?
Thank you,
Rahila Syed
On 7 Apr 2025, at 17:43, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote:
I think this function can be a valuable debugging aid going forward.
What I am most excited about for this is to be able to measure server-wide and
fleet-wide memory usage over time. Today I have actually very little idea
about what memory is being used for across all connections, not to speak of a
larger number of servers.
Thanks for looking, Rahila and I took a collective stab at the review comments.
+ before_shmem_exit(AtProcExit_memstats_dsa_free, 0); + SetProcessingMode(NormalProcessing); }How about putting it into BaseInit()? Or maybe just register it when its
first used?
Moved to BaseInit().
+MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
The memory context stuff abbreviates as cxt not ctx. There's a few more cases
of that in the patch.
I never get that right. Fixed.
+ return (context_type);
Why these parens?
Must be a leftover from something, fixed. Sorry about that.
+ * If the publishing backend does not respond before the condition variable + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is + * time left within the timeout specified by the user, before giving up and + * returning previously published statistics, if any. If no previous statistics + * exist, return NULL.Why do we need to repeatedly wake up rather than just sleeping with the
"remaining" amount of time based on the time the function was called and the
time that has passed since?
Fair point, the current coding was a conversion from the previous retry-based
approach but your suggestion is clearly correct. There is still potential for
refactoring but at this point I don't want to change too much all at once.
+ * A valid DSA pointer isn't proof that statistics are available, it can + * be valid due to previously published stats.Somehow "valid DSA pointer" is a bit too much about the precise mechanics and
not enough about what's actually happening. I'd rather say something like"Even if the proc has published statistics, they may not be due to the current
request, but previously published stats."
Agreed, thats better. Changed.
+ if (!IsUnderPostmaster) + { + Assert(!found);I don't really understand why this uses IsUnderPostmaster? Seems like this
should just use found like most (or all) the other *ShmemInit() functions do?
Agreed, Fixed.
+ LWLockInitialize(&memCtxArea->lw_lock, LWLockNewTrancheId());
I think for builtin code we just hardcode the tranches in BuiltinTrancheIds.
Fixed.
It feels a bit silly to duplicate the call to context->methods->stats three
times. We've changed these parameters a bunch in the past, having more callers
to fix makes that more work. Can't the switch just set up the args that are
then passed to one call to context->methods->stats?
I don't disagree, but I prefer to do that as a separate refactoring to not
change too many things all at once.
+ + /* Compute the number of stats that can fit in the defined limit */ + max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE) + / (MAX_MEMORY_CONTEXT_STATS_SIZE);MAX_SEGMENTS_PER_BACKEND sounds way too generic to me for something defined in
memutils.h. I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
something that makes sense to use here?
Renamed, and dependency on DSA_DEFAULT_INIT_SEGMENT_SIZE removed.
+ /* + * Hold the process lock to protect writes to process specific memory. Two + * processes publishing statistics do not block each other. + */s/specific/process specific/
That's what it says though.. isn't it? I might be missing something obvious.
+ dsa_free(area, memCtxState[idx].memstats_dsa_pointer); + memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;Both callers to free_memorycontextstate_dsa() do these lines immediately after
calling free_memorycontextstate_dsa(), why not do that inside?
Fixed.
+ /* Copy statistics to DSA memory */ + PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100); + } + else + { + /* Examine the context stats */ + memset(&stat, 0, sizeof(stat)); + (*cur->methods->stats) (cur, NULL, NULL, &stat, true);But do we really do it twice in a row? The lines are exactly the same, so it
seems that should just be done before the if?
Fixed.
+signal_memorycontext_reporting(void)
IMO somewhat confusing to release the lock in a function named
signal_memorycontext_reporting(). Why do we do that after
hash_destroy()/dsa_detach()?
The function has been renamed for clarity.
+ /* context id starts with 1 */
+ entry->context_id = ++(*stats_count);Given that we don't actually do anything here relating to starting with 1, I
find that comment confusing.
Reworded, not sure if it's much better tbh.
+ memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1);
Given the number of references to memctx_info[curr_id] I'd put it in a local variable.
I might be partial, but I sort of prefer this way since it makes the underlying
data structure clear to the reader.
Why is this a dsa_allocate0 given that we're immediately overwriting it?
It doesn't need to be zeroed as it's immediately overwritten. Fixed.
+ memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1); + identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident); + strlcpy(identptr, ident, idlen + 1);Hm. First I thought we'd leak memory if this second (and subsequent)
dsa_allocate failed. Then I thought we'd be ok, because the memory would be
memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
only set *after* we would have failed.
Keeping a running total in .total_stats should make the leak window smaller.
+ memctx_info[curr_id].type = ContextTypeToString(context->type);
I don't think this works across platforms. On windows / EXEC_BACKEND builds
the location of string constants can differ across backends. And: Why do we
need the string here? You can just call ContextTypeToString when reading?
Correct, we can just store the type and call ContextTypeToString when
generating the tuple. Fixed.
+/* + * Free the memory context statistics stored by this process + * in DSA area. + */ +void +AtProcExit_memstats_dsa_free(int code, Datum arg) +{FWIW, to me the fact that it does a dsa_free() is an implementation
detail. It's also not the only thing this does.
Renamed.
And, I don't think AtProcExit* really is accurate, given that it runs *before*
shmem is cleaned up?I wonder if the best approach here wouldn't be to forgo the use of a
before_shmem_exit() callback, but instead use on_dsm_detach(). That would
require we'd not constantly detach from the dsm segment, but I don't
understand why we do that in the first place?
The attach/detach has been removed.
+ /* If the dsm mapping could not be found, attach to the area */ + if (dsm_seg != NULL) + return;I don't understand what we do here with the dsm? Why do we not need cleanup
if we are already attached to the dsm segment?
Fixed.
+} MemoryContextState;
IMO that's too generic a name for something in a header.
+} MemoryContextId;
This too. Particularly because MemoryContextData->ident exist but is
something different.
Renamed both to use MemoryContextReporting* namespace, which leaves
MemoryContextReportingBackendState at an unwieldly long name. I'm running out
of ideas on how to improve and it does make purpose quite explicit at least.
+ from pg_get_process_memory_contexts(launcher_pid, false, 20) + where path = '{1}' into r; + RAISE NOTICE '%', r; + select type, name, ident + from pg_get_process_memory_contexts(pg_backend_pid(), false, 20) + where path = '{1}' into r; + RAISE NOTICE '%', r; +END $$;I'd also test an aux process. I think the AV launcher isn't one, because it
actually does "table" access of shared relations.
Fixed, switched from the AV launcher.
--
Daniel Gustafsson
Attachments:
v27-0001-Add-function-to-get-memory-context-stats-for-pro.patchapplication/octet-stream; name=v27-0001-Add-function-to-get-memory-context-stats-for-pro.patch; x-unix-mode=0644Download
From e8ed511ed0c6f64d3de115252db386973e5b843a Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Tue, 8 Apr 2025 01:13:00 +0200
Subject: [PATCH v27] Add function to get memory context stats for processes
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, the last known statistics are returned, or NULL if
no previously published statistics exist. This allows dash-
board type usages to continually publish data even if the
target process is temporarily congested. Context records
contain a timestamp to indicate when they were submitted.
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
---
doc/src/sgml/func.sgml | 171 +++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/lwlock.c | 2 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 426 +++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 635 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlock.h | 2 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 82 +++
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 4 +
26 files changed, 1374 insertions(+), 45 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0224f93733d..347f45a417d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28650,6 +28650,144 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>stats_timestamp</parameter> - When the statistics were
+ extracted from the process.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed. The levels
+ are limited to the first 100 contexts.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> is means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28789,6 +28927,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 273008db37f..1166e99a000 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -666,6 +666,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..16756152b71 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd2..cb7408acf4c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f871..c7a76711cc5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..00c76d05356 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextReportingShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextReportingShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b7c39a4c5f0..a3c2cd12277 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3df29658f18..dc4d96c16af 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -178,6 +178,8 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
[LWTRANCHE_AIO_URING_COMPLETION] = "AioUringCompletion",
+ [LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE] = "MemoryContextReportingState",
+ [LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC] = "MemoryContextReportingPerProcess",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6ae9f38f0c8..dc4c600922d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3535,6 +3535,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8bce14c38fd..23eaf559c8d 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..5036aa2d9f7 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryContextReportingBackendState *memCxtState = NULL;
+struct MemoryContextReportingSharedState *memCxtArea = NULL;
/*
* int_list_to_array
@@ -89,7 +86,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryContextReportingId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +140,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +155,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +201,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryContextReportingId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +228,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryContextReportingId *entry;
bool found;
/*
@@ -224,8 +236,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryContextReportingId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +317,349 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing backend.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
+ * time left within the timeout specified by the user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContextReportingStatsEntry *memcxt_info;
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCxtState[procNumber].summary = summary;
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ start_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Even if the proc has published statistics, the may not be due to the
+ * current request, but previously published stats. Check if the stats
+ * are updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for the timeout specified by the
+ * user, following which display old statistics if available or return
+ * NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid DSA
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCxtState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ msecs = TimestampDifferenceMilliseconds(start_timestamp,
+ memCxtState[procNumber].stats_timestamp);
+
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+ }
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ msecs = TimestampDifferenceMilliseconds(start_timestamp, GetCurrentTimestamp());
+
+ /*
+ * If we haven't already exceeded the timeout value, sleep for the
+ * remainder of the timeout on the condition variable.
+ */
+ if (msecs > 0 && msecs < (timeout * 1000))
+ {
+ /*
+ * Wait for the timeout as defined by the user. If no updated
+ * statistics are available within the allowed time then display
+ * previously published statistics if there are any. If no
+ * previous statistics are available then return NULL. The timer
+ * is defined in milliseconds since thats what the condition
+ * variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&memCxtState[procNumber].memcxt_cv,
+ ((timeout * 1000) - msecs), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ }
+ else
+ {
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ }
+
+ /*
+ * We should only reach here with a valid DSA handle, either containing
+ * updated statistics or previously published statistics (identified by
+ * the timestamp.
+ */
+ Assert(memCxtArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ /* Attach to the dsa area if we have not already done so */
+ if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryContextReportingStatsEntry *)
+ dsa_get_address(area, memCxtState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCxtState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum = NULL;
+ int *path_int = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memcxt_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memcxt_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+
+ if (DsaPointerIsValid(memcxt_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memcxt_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (DsaPointerIsValid(memcxt_info[i].path))
+ {
+ path_int = (int *) dsa_get_address(area, memcxt_info[i].path);
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(path_int[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCxtState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+Size
+MemoryContextReportingShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(MemoryContextReportingBackendState)));
+
+ sz = add_size(sz, sizeof(MemoryContextReportingSharedState));
+
+ return sz;
+}
+
+/*
+ * Initialize shared memory for displaying memory context statistics
+ */
+void
+MemoryContextReportingShmemInit(void)
+{
+ bool found;
+
+ memCxtArea = (MemoryContextReportingSharedState *)
+ ShmemInitStruct("MemoryContextReportingSharedState",
+ sizeof(MemoryContextReportingSharedState), &found);
+
+ if (found)
+ return;
+
+ LWLockInitialize(&memCxtArea->lw_lock, LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE);
+ memCxtArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+
+ memCxtState = (MemoryContextReportingBackendState *)
+ ShmemInitStruct("MemoryContextReportingBackendState",
+ ((MaxBackends + NUM_AUXILIARY_PROCS) * sizeof(MemoryContextReportingBackendState)),
+ &found);
+
+ if (found)
+ return;
+
+ for (int i = 0; i < (MaxBackends + NUM_AUXILIARY_PROCS); i++)
+ {
+ ConditionVariableInit(&memCxtState[i].memcxt_cv);
+ LWLockInitialize(&memCxtState[i].lw_lock, LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC);
+ memCxtState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 2152aad97d9..92304a1f124 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c09c4d404ba..01309ef3f86 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -667,6 +667,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index d98ae9db6be..7555a166f52 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,11 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/lwlock.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -135,6 +140,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -156,16 +172,31 @@ MemoryContext CurTransactionContext = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
+dsa_area *area = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location,
+ int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryContextReportingStatsEntry *memcxt_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area,
+ int max_levels);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer);
+static void end_memorycontext_reporting(void);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +862,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +909,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +924,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +976,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +995,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -939,7 +1014,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1351,22 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1404,528 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Statistics written by each process are tracked independently in
+ * per-process DSA pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility maximum size of statistics for each context. The remaining context
+ * statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryContextReportingStatsEntry *meminfo;
+ bool summary = false;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ int stats_num = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextReportingId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats =
+ MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE;
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+ summary = memCxtState[idx].summary;
+ LWLockRelease(&memCxtState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ summary);
+
+ /*
+ * Allocate memory in this process's DSA for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_num = Min(stats_count, max_stats);
+
+ LWLockAcquire(&memCxtArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCxtArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCxtArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the DSA area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that the statistics
+ * are published even if the process exits while a client is waiting.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCxtArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCxtArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+ memCxtState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCxtState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ free_memorycontextstate_dsa(area, memCxtState[idx].total_stats,
+ memCxtState[idx].memstats_dsa_pointer);
+ }
+ memCxtState[idx].memstats_dsa_pointer =
+ dsa_allocate0(area, stats_num * sizeof(MemoryContextReportingStatsEntry));
+
+ meminfo = (MemoryContextReportingStatsEntry *)
+ dsa_get_address(area, memCxtState[idx].memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, area, 100);
+ cxt_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ int level = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, level, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ /*
+ * Register the stats entry first, that way the cleanup handler
+ * can reach it in case of allocation failures of one or more
+ * members.
+ */
+ memCxtState[idx].total_stats = cxt_id++;
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, area, 100);
+ }
+ memCxtState[idx].total_stats = cxt_id;
+
+ end_memorycontext_reporting();
+
+ /* Notify waiting backends and return */
+ hash_destroy(context_id_lookup);
+
+ return;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (max_stats - 1) || stats_count <= max_stats)
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100);
+ }
+ else
+ {
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When stats_count is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (stats_count > max_stats && context_id == (max_stats - 2))
+ {
+ char *nameptr;
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate(area, namelen + 1);
+ nameptr = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(nameptr, "Remaining Totals", namelen);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = 0;
+ }
+ context_id++;
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * stats_count <= max_stats.
+ */
+ if (stats_count <= max_stats)
+ {
+ memCxtState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCxtState[idx].total_stats = num_individual_stats + 1;
+ }
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting();
+
+ hash_destroy(context_id_lookup);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(void)
+{
+ memCxtState[MyProcNumber].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCxtState[MyProcNumber].lw_lock);
+ ConditionVariableBroadcast(&memCxtState[MyProcNumber].memcxt_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextReportingId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryContextReportingId *entry;
+ bool found;
+
+ entry = (MemoryContextReportingId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /* context id starts with 1 */
+ entry->context_id = ++(*stats_count);
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (summary)
+ {
+ entry = (MemoryContextReportingId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = ++(*stats_count);
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary mode only the first two level (from top) contexts are
+ * displayed.
+ */
+ if (summary)
+ break;
+ }
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryContextReportingStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+ int *path_list;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+ char *nameptr;
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcxt_info[curr_id].name = dsa_allocate(area, namelen + 1);
+ nameptr = (char *) dsa_get_address(area, memcxt_info[curr_id].name);
+ strlcpy(nameptr, name, namelen + 1);
+ }
+ else
+ memcxt_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+ char *identptr;
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcxt_info[curr_id].ident = dsa_allocate(area, idlen + 1);
+ identptr = (char *) dsa_get_address(area, memcxt_info[curr_id].ident);
+ strlcpy(identptr, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident = InvalidDsaPointer;
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].path = dsa_allocate0(area, levels * sizeof(int));
+ memcxt_info[curr_id].levels = list_length(path);
+ path_list = (int *) dsa_get_address(area, memcxt_info[curr_id].path);
+
+ foreach_int(i, path)
+ {
+ path_list[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+/*
+ * free_memorycontextstate_dsa
+ *
+ * Worker for freeing resources from a MemoryContextReportingStatsEntry. Callers are
+ * responsible for ensuring that the DSA pointer is valid.
+ */
+static void
+free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryContextReportingStatsEntry *meminfo;
+
+ meminfo = (MemoryContextReportingStatsEntry *) dsa_get_address(area, prev_dsa_pointer);
+ Assert(meminfo != NULL);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+
+ dsa_free(area, memCxtState[MyProcNumber].memstats_dsa_pointer);
+ memCxtState[MyProcNumber].memstats_dsa_pointer = InvalidDsaPointer;
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in DSA area.
+ */
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+
+ if (memCxtArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ return;
+
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCxtState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCxtState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsa mapping could not be found, attach to the area */
+ if (area == NULL)
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ free_memorycontextstate_dsa(area, memCxtState[idx].total_stats,
+ memCxtState[idx].memstats_dsa_pointer);
+
+ dsa_detach(area);
+ LWLockRelease(&memCxtState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5d5be8ba4e1..90675be66f6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8559,6 +8559,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0d8528b2875..58b2496a9cb 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 4df1d25c045..d333f338ebb 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,8 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
LWTRANCHE_AIO_URING_COMPLETION,
+ LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE,
+ LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 016dfd9b3f6..cfe14631445 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..c454ee0b897 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryContextReportingStatsEntry) + \
+ (100 * sizeof(int)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE))
/*
* Standard top-level memory contexts.
@@ -319,4 +339,66 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryContextReportingStatsEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryContextReportingStatsEntry;
+
+/*
+ * Static shared memory state representing the DSA area created for memory
+ * context statistics reporting. A single DSA area is created and used by all
+ * the processes, each having its specific DSA allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryContextReportingSharedState
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryContextReportingSharedState;
+
+/*
+ * Per backend static shared memory state for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryContextReportingBackendState
+{
+ ConditionVariable memcxt_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryContextReportingBackendState;
+
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryContextReportingId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryContextReportingId;
+
+extern PGDLLIMPORT MemoryContextReportingBackendState *memCxtState;
+extern PGDLLIMPORT MemoryContextReportingSharedState *memCxtArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern Size MemoryContextReportingShmemSize(void);
+extern void MemoryContextReportingShmemInit(void);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
+extern dsa_area *area;
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d42b943ef94..84db025b855 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1668,6 +1668,10 @@ MemoryContextCounters
MemoryContextData
MemoryContextMethodID
MemoryContextMethods
+MemoryContextReportingBackendState
+MemoryContextReportingId
+MemoryContextReportingSharedState
+MemoryContextReportingStatsEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.39.3 (Apple Git-146)
Hi,
On 2025-04-08 01:17:17 +0200, Daniel Gustafsson wrote:
On 7 Apr 2025, at 17:43, Andres Freund <andres@anarazel.de> wrote:
+ /* + * Hold the process lock to protect writes to process specific memory. Two + * processes publishing statistics do not block each other. + */s/specific/process specific/
That's what it says though.. isn't it? I might be missing something obvious.
Understandable confusion, not sure what my brain was doing anymore
either...
+} MemoryContextState;
IMO that's too generic a name for something in a header.
+} MemoryContextId;
This too. Particularly because MemoryContextData->ident exist but is
something different.Renamed both to use MemoryContextReporting* namespace, which leaves
MemoryContextReportingBackendState at an unwieldly long name. I'm running out
of ideas on how to improve and it does make purpose quite explicit at least.
How about
MemoryContextReportingBackendState -> MemoryStatsBackendState
MemoryContextReportingId -> MemoryStatsContextId
MemoryContextReportingSharedState -> MemoryStatsCtl
MemoryContextReportingStatsEntry -> MemoryStatsEntry
+ /* context id starts with 1 */
+ entry->context_id = ++(*stats_count);Given that we don't actually do anything here relating to starting with 1, I
find that comment confusing.Reworded, not sure if it's much better tbh.
I'd probably just remove the comment.
Hm. First I thought we'd leak memory if this second (and subsequent)
dsa_allocate failed. Then I thought we'd be ok, because the memory would be
memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
only set *after* we would have failed.Keeping a running total in .total_stats should make the leak window smaller.
Why not just initialize .total_stats *before* calling any fallible code?
Afaict it's zero-allocated, so the free function should have no problem
dealing with the entries that haven't yet been populated/
Greetings,
Andres Freund
Hi Daniel, Andres,
+} MemoryContextState;
IMO that's too generic a name for something in a header.
+} MemoryContextId;
This too. Particularly because MemoryContextData->ident exist but is
something different.Renamed both to use MemoryContextReporting* namespace, which leaves
MemoryContextReportingBackendState at an unwieldly long name. I'mrunning out
of ideas on how to improve and it does make purpose quite explicit at
least.
How about
MemoryContextReportingBackendState -> MemoryStatsBackendState
MemoryContextReportingId -> MemoryStatsContextId
MemoryContextReportingSharedState -> MemoryStatsCtl
MemoryContextReportingStatsEntry -> MemoryStatsEntry
Fixed accordingly.
+ /* context id starts with 1 */
+ entry->context_id = ++(*stats_count);Given that we don't actually do anything here relating to starting
with 1, I
find that comment confusing.
Reworded, not sure if it's much better tbh.
I'd probably just remove the comment.
Reworded to mention that we pre-increment stats_count to make sure
id starts with 1.
Hm. First I thought we'd leak memory if this second (and subsequent)
dsa_allocate failed. Then I thought we'd be ok, because the memorywould be
memory because it'd be reachable from
memCtxState[idx].memstats_dsa_pointer.
But I think it wouldn't *quite* work, because
memCtxState[idx].total_stats is
only set *after* we would have failed.
Keeping a running total in .total_stats should make the leak window
smaller.
Why not just initialize .total_stats *before* calling any fallible code?
Afaict it's zero-allocated, so the free function should have no problem
dealing with the entries that haven't yet been populated/
Fixed accordingly.
PFA a v28 which passes all local and github CI tests.
Thank you,
Rahila Syed
Attachments:
v28-0001-Add-function-to-get-memory-context-stats-for-process.patchapplication/octet-stream; name=v28-0001-Add-function-to-get-memory-context-stats-for-process.patchDownload
From e305ce493c434dbbac8b58d8730b54159ac14795 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Tue, 8 Apr 2025 09:38:12 +0530
Subject: [PATCH] Add function to get memory context stats for processes
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, the last known statistics are returned, or NULL if
no previously published statistics exist. This allows dash-
board type usages to continually publish data even if the
target process is temporarily congested. Context records
contain a timestamp to indicate when they were submitted.
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
---
doc/src/sgml/func.sgml | 171 +++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/lwlock.c | 2 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 426 +++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 644 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlock.h | 2 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 82 +++
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 4 +
26 files changed, 1383 insertions(+), 45 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9ab070adffb..42ec4340da1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28663,6 +28663,144 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>stats_timestamp</parameter> - When the statistics were
+ extracted from the process.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed. The levels
+ are limited to the first 100 contexts.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> is means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28802,6 +28940,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 08f780a2e63..15efb02badb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -674,6 +674,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..16756152b71 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd2..cb7408acf4c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f871..c7a76711cc5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..00c76d05356 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextReportingShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextReportingShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b7c39a4c5f0..a3c2cd12277 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3df29658f18..dc4d96c16af 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -178,6 +178,8 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
[LWTRANCHE_AIO_URING_COMPLETION] = "AioUringCompletion",
+ [LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE] = "MemoryContextReportingState",
+ [LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC] = "MemoryContextReportingPerProcess",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6ae9f38f0c8..dc4c600922d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3535,6 +3535,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8bce14c38fd..23eaf559c8d 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..2b357902346 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryStatsBackendState *memCxtState = NULL;
+struct MemoryStatsCtl *memCxtArea = NULL;
/*
* int_list_to_array
@@ -89,7 +86,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +140,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +155,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +201,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +228,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +236,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +317,349 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing backend.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
+ * time left within the timeout specified by the user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCxtState[procNumber].summary = summary;
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ start_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Even if the proc has published statistics, the may not be due to the
+ * current request, but previously published stats. Check if the stats
+ * are updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for the timeout specified by the
+ * user, following which display old statistics if available or return
+ * NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid DSA
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCxtState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ msecs = TimestampDifferenceMilliseconds(start_timestamp,
+ memCxtState[procNumber].stats_timestamp);
+
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+ }
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ msecs = TimestampDifferenceMilliseconds(start_timestamp, GetCurrentTimestamp());
+
+ /*
+ * If we haven't already exceeded the timeout value, sleep for the
+ * remainder of the timeout on the condition variable.
+ */
+ if (msecs > 0 && msecs < (timeout * 1000))
+ {
+ /*
+ * Wait for the timeout as defined by the user. If no updated
+ * statistics are available within the allowed time then display
+ * previously published statistics if there are any. If no
+ * previous statistics are available then return NULL. The timer
+ * is defined in milliseconds since thats what the condition
+ * variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&memCxtState[procNumber].memcxt_cv,
+ ((timeout * 1000) - msecs), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ }
+ else
+ {
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ }
+
+ /*
+ * We should only reach here with a valid DSA handle, either containing
+ * updated statistics or previously published statistics (identified by
+ * the timestamp.
+ */
+ Assert(memCxtArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ /* Attach to the dsa area if we have not already done so */
+ if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(area, memCxtState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCxtState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum = NULL;
+ int *path_int = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memcxt_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memcxt_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+
+ if (DsaPointerIsValid(memcxt_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memcxt_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (DsaPointerIsValid(memcxt_info[i].path))
+ {
+ path_int = (int *) dsa_get_address(area, memcxt_info[i].path);
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(path_int[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCxtState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+Size
+MemoryContextReportingShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(MemoryStatsBackendState)));
+
+ sz = add_size(sz, sizeof(MemoryStatsCtl));
+
+ return sz;
+}
+
+/*
+ * Initialize shared memory for displaying memory context statistics
+ */
+void
+MemoryContextReportingShmemInit(void)
+{
+ bool found;
+
+ memCxtArea = (MemoryStatsCtl *)
+ ShmemInitStruct("MemoryStatsCtl",
+ sizeof(MemoryStatsCtl), &found);
+
+ if (found)
+ return;
+
+ LWLockInitialize(&memCxtArea->lw_lock, LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE);
+ memCxtArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+
+ memCxtState = (MemoryStatsBackendState *)
+ ShmemInitStruct("MemoryStatsBackendState",
+ ((MaxBackends + NUM_AUXILIARY_PROCS) * sizeof(MemoryStatsBackendState)),
+ &found);
+
+ if (found)
+ return;
+
+ for (int i = 0; i < (MaxBackends + NUM_AUXILIARY_PROCS); i++)
+ {
+ ConditionVariableInit(&memCxtState[i].memcxt_cv);
+ LWLockInitialize(&memCxtState[i].lw_lock, LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC);
+ memCxtState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 2152aad97d9..92304a1f124 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c09c4d404ba..01309ef3f86 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -667,6 +667,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index d98ae9db6be..f3a588e1375 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,11 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/lwlock.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -135,6 +140,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -156,16 +172,31 @@ MemoryContext CurTransactionContext = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
+dsa_area *area = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location,
+ int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area,
+ int max_levels);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer);
+static void end_memorycontext_reporting(void);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +862,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +909,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +924,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +976,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +995,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -939,7 +1014,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1351,22 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1404,537 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Statistics written by each process are tracked independently in
+ * per-process DSA pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility maximum size of statistics for each context. The remaining context
+ * statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ int stats_num = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats =
+ MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE;
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+ summary = memCxtState[idx].summary;
+ LWLockRelease(&memCxtState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ summary);
+
+ /*
+ * Allocate memory in this process's DSA for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_num = Min(stats_count, max_stats);
+
+ LWLockAcquire(&memCxtArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCxtArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCxtArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the DSA area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that the statistics
+ * are published even if the process exits while a client is waiting.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCxtArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCxtArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+ memCxtState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCxtState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ free_memorycontextstate_dsa(area, memCxtState[idx].total_stats,
+ memCxtState[idx].memstats_dsa_pointer);
+ }
+ /*
+ * Assigning total stats before allocating memory so that memory cleanup
+ * can run if any subsequent dsa_allocate call to allocate name/ident/path
+ * fails.
+ */
+ memCxtState[idx].total_stats = stats_num;
+ memCxtState[idx].memstats_dsa_pointer =
+ dsa_allocate0(area, stats_num * sizeof(MemoryStatsEntry));
+
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(area, memCxtState[idx].memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, area, 100);
+ cxt_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ int level = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, level, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ /*
+ * Register the stats entry first, that way the cleanup handler
+ * can reach it in case of allocation failures of one or more
+ * members.
+ */
+ memCxtState[idx].total_stats = cxt_id++;
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, area, 100);
+ }
+ memCxtState[idx].total_stats = cxt_id;
+
+ end_memorycontext_reporting();
+
+ /* Notify waiting backends and return */
+ hash_destroy(context_id_lookup);
+
+ return;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (max_stats - 1) || stats_count <= max_stats)
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100);
+ }
+ else
+ {
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When stats_count is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (stats_count > max_stats && context_id == (max_stats - 2))
+ {
+ char *nameptr;
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate(area, namelen + 1);
+ nameptr = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(nameptr, "Remaining Totals", namelen);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = 0;
+ }
+ context_id++;
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * stats_count <= max_stats.
+ */
+ if (stats_count <= max_stats)
+ {
+ memCxtState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCxtState[idx].total_stats = num_individual_stats + 1;
+ }
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting();
+
+ hash_destroy(context_id_lookup);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(void)
+{
+ memCxtState[MyProcNumber].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCxtState[MyProcNumber].lw_lock);
+ ConditionVariableBroadcast(&memCxtState[MyProcNumber].memcxt_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryStatsContextId *entry;
+ bool found;
+
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1 so increment the stats_count
+ * before assigning
+ */
+ entry->context_id = ++(*stats_count);
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (summary)
+ {
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = ++(*stats_count);
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary mode only the first two level (from top) contexts are
+ * displayed.
+ */
+ if (summary)
+ break;
+ }
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+ int *path_list;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+ char *nameptr;
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcxt_info[curr_id].name = dsa_allocate(area, namelen + 1);
+ nameptr = (char *) dsa_get_address(area, memcxt_info[curr_id].name);
+ strlcpy(nameptr, name, namelen + 1);
+ }
+ else
+ memcxt_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+ char *identptr;
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcxt_info[curr_id].ident = dsa_allocate(area, idlen + 1);
+ identptr = (char *) dsa_get_address(area, memcxt_info[curr_id].ident);
+ strlcpy(identptr, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident = InvalidDsaPointer;
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].path = dsa_allocate0(area, levels * sizeof(int));
+ memcxt_info[curr_id].levels = list_length(path);
+ path_list = (int *) dsa_get_address(area, memcxt_info[curr_id].path);
+
+ foreach_int(i, path)
+ {
+ path_list[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+/*
+ * free_memorycontextstate_dsa
+ *
+ * Worker for freeing resources from a MemoryStatsEntry. Callers are
+ * responsible for ensuring that the DSA pointer is valid.
+ */
+static void
+free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryStatsEntry *meminfo;
+
+ meminfo = (MemoryStatsEntry *) dsa_get_address(area, prev_dsa_pointer);
+ Assert(meminfo != NULL);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+
+ dsa_free(area, memCxtState[MyProcNumber].memstats_dsa_pointer);
+ memCxtState[MyProcNumber].memstats_dsa_pointer = InvalidDsaPointer;
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in DSA area.
+ */
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+
+ if (memCxtArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ return;
+
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCxtState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCxtState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsa mapping could not be found, attach to the area */
+ if (area == NULL)
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ free_memorycontextstate_dsa(area, memCxtState[idx].total_stats,
+ memCxtState[idx].memstats_dsa_pointer);
+
+ dsa_detach(area);
+ LWLockRelease(&memCxtState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 37a484147a8..4708f55be18 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8571,6 +8571,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0d8528b2875..58b2496a9cb 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 4df1d25c045..d333f338ebb 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,8 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
LWTRANCHE_AIO_URING_COMPLETION,
+ LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE,
+ LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 016dfd9b3f6..cfe14631445 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..d328270fafc 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry) + \
+ (100 * sizeof(int)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE))
/*
* Standard top-level memory contexts.
@@ -319,4 +339,66 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Static shared memory state representing the DSA area created for memory
+ * context statistics reporting. A single DSA area is created and used by all
+ * the processes, each having its specific DSA allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryStatsCtl
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryStatsCtl;
+
+/*
+ * Per backend static shared memory state for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsBackendState
+{
+ ConditionVariable memcxt_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryStatsBackendState;
+
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+extern PGDLLIMPORT MemoryStatsBackendState *memCxtState;
+extern PGDLLIMPORT MemoryStatsCtl *memCxtArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern Size MemoryContextReportingShmemSize(void);
+extern void MemoryContextReportingShmemInit(void);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
+extern dsa_area *area;
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f7ba0ec809e..b845fa90514 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1670,6 +1670,10 @@ MemoryContextCounters
MemoryContextData
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsBackendState
+MemoryStatsContextId
+MemoryStatsCtl
+MemoryStatsEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 8 Apr 2025, at 07:40, Rahila Syed <rahilasyed90@gmail.com> wrote:
Renamed both to use MemoryContextReporting* namespace, which leaves
MemoryContextReportingBackendState at an unwieldly long name. I'm running out
of ideas on how to improve and it does make purpose quite explicit at least.How about
MemoryContextReportingBackendState -> MemoryStatsBackendState
MemoryContextReportingId -> MemoryStatsContextId
MemoryContextReportingSharedState -> MemoryStatsCtl
MemoryContextReportingStatsEntry -> MemoryStatsEntryFixed accordingly.
That's much better, thanks.
There was a bug in the shmem init function which caused it to fail on Windows,
the attached fixes that.
--
Daniel Gustafsson

Attachments:
v29-0001-Add-function-to-get-memory-context-stats-for-pro.patchapplication/octet-stream; name=v29-0001-Add-function-to-get-memory-context-stats-for-pro.patch; x-unix-mode=0644Download
From ec215974fff965040e3531ac57c3fc3023be07ff Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Tue, 8 Apr 2025 09:38:12 +0530
Subject: [PATCH v29] Add function to get memory context stats for processes
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, the last known statistics are returned, or NULL if
no previously published statistics exist. This allows dash-
board type usages to continually publish data even if the
target process is temporarily congested. Context records
contain a timestamp to indicate when they were submitted.
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
---
doc/src/sgml/func.sgml | 171 +++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/lwlock.c | 2 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 426 +++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 644 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlock.h | 2 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 82 +++
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 4 +
26 files changed, 1383 insertions(+), 45 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9ab070adffb..42ec4340da1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28663,6 +28663,144 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>stats_timestamp</parameter> - When the statistics were
+ extracted from the process.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed. The levels
+ are limited to the first 100 contexts.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> is means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28802,6 +28940,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used
+ to request the memory contexts statistics of any postgres process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+stats_timestamp | 2025-03-24 13:55:47.796698+01
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 08f780a2e63..15efb02badb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -674,6 +674,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..16756152b71 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 7e622ae4bd2..cb7408acf4c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 0fec4f1f871..c7a76711cc5 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..00c76d05356 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextReportingShmemSize());
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextReportingShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b7c39a4c5f0..a3c2cd12277 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -690,6 +690,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3df29658f18..dc4d96c16af 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -178,6 +178,8 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
[LWTRANCHE_AIO_URING_COMPLETION] = "AioUringCompletion",
+ [LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE] = "MemoryContextReportingState",
+ [LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC] = "MemoryContextReportingPerProcess",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6ae9f38f0c8..dc4c600922d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3535,6 +3535,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 8bce14c38fd..23eaf559c8d 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 396c2f223b4..d459c89cfde 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,25 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
/* ----------
* The max bytes for showing identifiers of MemoryContext.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
-
-/*
- * MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
- */
-typedef struct MemoryContextId
-{
- MemoryContext context;
- int context_id;
-} MemoryContextId;
+struct MemoryStatsBackendState *memCxtState = NULL;
+struct MemoryStatsCtl *memCxtArea = NULL;
/*
* int_list_to_array
@@ -89,7 +86,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +140,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +155,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +201,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +228,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +236,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +317,349 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
+ * signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per publishing backend.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
+ * time left within the timeout specified by the user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ memCxtState[procNumber].summary = summary;
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ start_timestamp = GetCurrentTimestamp();
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Even if the proc has published statistics, the may not be due to the
+ * current request, but previously published stats. Check if the stats
+ * are updated by comparing the timestamp, if the stats are newer than our
+ * previously recorded timestamp from before sending the procsignal, they
+ * must by definition be updated. Wait for the timeout specified by the
+ * user, following which display old statistics if available or return
+ * NULL.
+ */
+ while (1)
+ {
+ long msecs;
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the valid DSA
+ * pointer.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ */
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ */
+ if (memCxtState[procNumber].proc_id == pid)
+ {
+ /*
+ * Break if the latest stats have been read, indicated by
+ * statistics timestamp being newer than the current request
+ * timestamp.
+ */
+ msecs = TimestampDifferenceMilliseconds(start_timestamp,
+ memCxtState[procNumber].stats_timestamp);
+
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer)
+ && msecs > 0)
+ break;
+ }
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ msecs = TimestampDifferenceMilliseconds(start_timestamp, GetCurrentTimestamp());
+
+ /*
+ * If we haven't already exceeded the timeout value, sleep for the
+ * remainder of the timeout on the condition variable.
+ */
+ if (msecs > 0 && msecs < (timeout * 1000))
+ {
+ /*
+ * Wait for the timeout as defined by the user. If no updated
+ * statistics are available within the allowed time then display
+ * previously published statistics if there are any. If no
+ * previous statistics are available then return NULL. The timer
+ * is defined in milliseconds since thats what the condition
+ * variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&memCxtState[procNumber].memcxt_cv,
+ ((timeout * 1000) - msecs), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ }
+ else
+ {
+ LWLockAcquire(&memCxtState[procNumber].lw_lock, LW_EXCLUSIVE);
+ /* Displaying previously published statistics if available */
+ if (DsaPointerIsValid(memCxtState[procNumber].memstats_dsa_pointer))
+ break;
+ else
+ {
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+ PG_RETURN_NULL();
+ }
+ }
+ }
+
+ /*
+ * We should only reach here with a valid DSA handle, either containing
+ * updated statistics or previously published statistics (identified by
+ * the timestamp.
+ */
+ Assert(memCxtArea->memstats_dsa_handle != DSA_HANDLE_INVALID);
+ /* Attach to the dsa area if we have not already done so */
+ if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(area, memCxtState[procNumber].memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 12
+ for (int i = 0; i < memCxtState[procNumber].total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ char *name;
+ char *ident;
+ Datum *path_datum = NULL;
+ int *path_int = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (DsaPointerIsValid(memcxt_info[i].name))
+ {
+ name = (char *) dsa_get_address(area, memcxt_info[i].name);
+ values[0] = CStringGetTextDatum(name);
+ }
+ else
+ nulls[0] = true;
+
+ if (DsaPointerIsValid(memcxt_info[i].ident))
+ {
+ ident = (char *) dsa_get_address(area, memcxt_info[i].ident);
+ values[1] = CStringGetTextDatum(ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (DsaPointerIsValid(memcxt_info[i].path))
+ {
+ path_int = (int *) dsa_get_address(area, memcxt_info[i].path);
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(path_int[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+ values[11] = TimestampTzGetDatum(memCxtState[procNumber].stats_timestamp);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ LWLockRelease(&memCxtState[procNumber].lw_lock);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+Size
+MemoryContextReportingShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(MemoryStatsBackendState)));
+
+ sz = add_size(sz, sizeof(MemoryStatsCtl));
+
+ return sz;
+}
+
+/*
+ * Initialize shared memory for displaying memory context statistics
+ */
+void
+MemoryContextReportingShmemInit(void)
+{
+ bool found;
+
+ memCxtArea = (MemoryStatsCtl *)
+ ShmemInitStruct("MemoryStatsCtl",
+ sizeof(MemoryStatsCtl), &found);
+
+ if (!found)
+ {
+ LWLockInitialize(&memCxtArea->lw_lock, LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE);
+ memCxtArea->memstats_dsa_handle = DSA_HANDLE_INVALID;
+ }
+
+ memCxtState = (MemoryStatsBackendState *)
+ ShmemInitStruct("MemoryStatsBackendState",
+ ((MaxBackends + NUM_AUXILIARY_PROCS) * sizeof(MemoryStatsBackendState)),
+ &found);
+
+ if (found)
+ return;
+
+ for (int i = 0; i < (MaxBackends + NUM_AUXILIARY_PROCS); i++)
+ {
+ ConditionVariableInit(&memCxtState[i].memcxt_cv);
+ LWLockInitialize(&memCxtState[i].lw_lock, LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC);
+ memCxtState[i].memstats_dsa_pointer = InvalidDsaPointer;
+ }
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 2152aad97d9..92304a1f124 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c09c4d404ba..01309ef3f86 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -667,6 +667,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index d98ae9db6be..f3a588e1375 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,11 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/pg_list.h"
+#include "storage/lwlock.h"
+#include "storage/ipc.h"
+#include "utils/dsa.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -135,6 +140,17 @@ static const MemoryContextMethods mcxt_methods[] = {
};
#undef BOGUS_MCTX
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
/*
* CurrentMemoryContext
@@ -156,16 +172,31 @@ MemoryContext CurTransactionContext = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
+dsa_area *area = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr);
+ PrintDestination print_location,
+ int *num_contexts);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_infos,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, dsa_area *area,
+ int max_levels);
+static void compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count,
+ bool summary);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer);
+static void end_memorycontext_reporting(void);
/*
* You should not do memory allocations within a critical section, because
@@ -831,11 +862,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 0, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +909,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
static void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +924,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +976,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +995,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i <= level; i++)
fprintf(stderr, " ");
@@ -939,7 +1014,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
@@ -1276,6 +1351,22 @@ HandleLogMemoryContextInterrupt(void)
/* latch will be set by procsignal_sigusr1_handler */
}
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
/*
* ProcessLogMemoryContextInterrupt
* Perform logging of memory contexts of this backend process.
@@ -1313,6 +1404,537 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Statistics written by each process are tracked independently in
+ * per-process DSA pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility maximum size of statistics for each context. The remaining context
+ * statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ int max_stats;
+ int idx = MyProcNumber;
+ int stats_count = 0;
+ int stats_num = 0;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /* Compute the number of stats that can fit in the defined limit */
+ max_stats =
+ MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE;
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+ summary = memCxtState[idx].summary;
+ LWLockRelease(&memCxtState[idx].lw_lock);
+
+ /*
+ * Traverse the memory context tree to find total number of contexts. If
+ * summary is requested report the total number of contexts at level 1 and
+ * 2 from the top. Also, populate the hash table of context ids.
+ */
+ compute_contexts_count_and_ids(contexts, context_id_lookup, &stats_count,
+ summary);
+
+ /*
+ * Allocate memory in this process's DSA for storing statistics of the the
+ * memory contexts upto max_stats, for contexts that don't fit within a
+ * limit, a cumulative total is written as the last record in the DSA
+ * segment.
+ */
+ stats_num = Min(stats_count, max_stats);
+
+ LWLockAcquire(&memCxtArea->lw_lock, LW_EXCLUSIVE);
+
+ /*
+ * Create a DSA and send handle to the the client process after storing
+ * the context statistics. If number of contexts exceed a predefined
+ * limit(8MB), a cumulative total is stored for such contexts.
+ */
+ if (memCxtArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+ dsa_handle handle;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+
+ area = dsa_create(memCxtArea->lw_lock.tranche);
+
+ handle = dsa_get_handle(area);
+ MemoryContextSwitchTo(oldcontext);
+
+ dsa_pin_mapping(area);
+
+ /*
+ * Pin the DSA area, this is to make sure the area remains attachable
+ * even if current backend exits. This is done so that the statistics
+ * are published even if the process exits while a client is waiting.
+ */
+ dsa_pin(area);
+
+ /* Set the handle in shared memory */
+ memCxtArea->memstats_dsa_handle = handle;
+ }
+
+ /*
+ * If DSA exists, created by another process publishing statistics, attach
+ * to it.
+ */
+ else if (area == NULL)
+ {
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+ MemoryContextSwitchTo(oldcontext);
+ dsa_pin_mapping(area);
+ }
+ LWLockRelease(&memCxtArea->lw_lock);
+
+ /*
+ * Hold the process lock to protect writes to process specific memory. Two
+ * processes publishing statistics do not block each other.
+ */
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+ memCxtState[idx].proc_id = MyProcPid;
+
+ if (DsaPointerIsValid(memCxtState[idx].memstats_dsa_pointer))
+ {
+ /*
+ * Free any previous allocations, free the name, ident and path
+ * pointers before freeing the pointer that contains them.
+ */
+ free_memorycontextstate_dsa(area, memCxtState[idx].total_stats,
+ memCxtState[idx].memstats_dsa_pointer);
+ }
+ /*
+ * Assigning total stats before allocating memory so that memory cleanup
+ * can run if any subsequent dsa_allocate call to allocate name/ident/path
+ * fails.
+ */
+ memCxtState[idx].total_stats = stats_num;
+ memCxtState[idx].memstats_dsa_pointer =
+ dsa_allocate0(area, stats_num * sizeof(MemoryStatsEntry));
+
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(area, memCxtState[idx].memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, area, 100);
+ cxt_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+ int level = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ MemoryContextStatsInternal(c, level, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ /*
+ * Register the stats entry first, that way the cleanup handler
+ * can reach it in case of allocation failures of one or more
+ * members.
+ */
+ memCxtState[idx].total_stats = cxt_id++;
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, area, 100);
+ }
+ memCxtState[idx].total_stats = cxt_id;
+
+ end_memorycontext_reporting();
+
+ /* Notify waiting backends and return */
+ hash_destroy(context_id_lookup);
+
+ return;
+ }
+
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (max_stats - 1) || stats_count <= max_stats)
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100);
+ }
+ else
+ {
+ meminfo[max_stats - 1].totalspace += stat.totalspace;
+ meminfo[max_stats - 1].nblocks += stat.nblocks;
+ meminfo[max_stats - 1].freespace += stat.freespace;
+ meminfo[max_stats - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When stats_count is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (stats_count > max_stats && context_id == (max_stats - 2))
+ {
+ char *nameptr;
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ meminfo[max_stats - 1].name = dsa_allocate(area, namelen + 1);
+ nameptr = dsa_get_address(area, meminfo[max_stats - 1].name);
+ strncpy(nameptr, "Remaining Totals", namelen);
+ meminfo[max_stats - 1].ident = InvalidDsaPointer;
+ meminfo[max_stats - 1].path = InvalidDsaPointer;
+ meminfo[max_stats - 1].type = 0;
+ }
+ context_id++;
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * stats_count <= max_stats.
+ */
+ if (stats_count <= max_stats)
+ {
+ memCxtState[idx].total_stats = context_id;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[max_stats - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ memCxtState[idx].total_stats = num_individual_stats + 1;
+ }
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting();
+
+ hash_destroy(context_id_lookup);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(void)
+{
+ memCxtState[MyProcNumber].stats_timestamp = GetCurrentTimestamp();
+ LWLockRelease(&memCxtState[MyProcNumber].lw_lock);
+ ConditionVariableBroadcast(&memCxtState[MyProcNumber].memcxt_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * Return the number of contexts allocated currently by the backend
+ * Assign context ids to each of the contexts.
+ */
+static void
+compute_contexts_count_and_ids(List *contexts, HTAB *context_id_lookup,
+ int *stats_count, bool summary)
+{
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ MemoryStatsContextId *entry;
+ bool found;
+
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1 so increment the stats_count
+ * before assigning
+ */
+ entry->context_id = ++(*stats_count);
+
+ /* Append the children of the current context to the main list. */
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ {
+ if (summary)
+ {
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ entry->context_id = ++(*stats_count);
+ }
+
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * In summary mode only the first two level (from top) contexts are
+ * displayed.
+ */
+ if (summary)
+ break;
+ }
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts,
+ dsa_area *area, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+ int *path_list;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+ char *nameptr;
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcxt_info[curr_id].name = dsa_allocate(area, namelen + 1);
+ nameptr = (char *) dsa_get_address(area, memcxt_info[curr_id].name);
+ strlcpy(nameptr, name, namelen + 1);
+ }
+ else
+ memcxt_info[curr_id].name = InvalidDsaPointer;
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+ char *identptr;
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ memcxt_info[curr_id].ident = dsa_allocate(area, idlen + 1);
+ identptr = (char *) dsa_get_address(area, memcxt_info[curr_id].ident);
+ strlcpy(identptr, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident = InvalidDsaPointer;
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path = InvalidDsaPointer;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].path = dsa_allocate0(area, levels * sizeof(int));
+ memcxt_info[curr_id].levels = list_length(path);
+ path_list = (int *) dsa_get_address(area, memcxt_info[curr_id].path);
+
+ foreach_int(i, path)
+ {
+ path_list[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+/*
+ * free_memorycontextstate_dsa
+ *
+ * Worker for freeing resources from a MemoryStatsEntry. Callers are
+ * responsible for ensuring that the DSA pointer is valid.
+ */
+static void
+free_memorycontextstate_dsa(dsa_area *area, int total_stats,
+ dsa_pointer prev_dsa_pointer)
+{
+ MemoryStatsEntry *meminfo;
+
+ meminfo = (MemoryStatsEntry *) dsa_get_address(area, prev_dsa_pointer);
+ Assert(meminfo != NULL);
+ for (int i = 0; i < total_stats; i++)
+ {
+ if (DsaPointerIsValid(meminfo[i].name))
+ dsa_free(area, meminfo[i].name);
+
+ if (DsaPointerIsValid(meminfo[i].ident))
+ dsa_free(area, meminfo[i].ident);
+
+ if (DsaPointerIsValid(meminfo[i].path))
+ dsa_free(area, meminfo[i].path);
+ }
+
+ dsa_free(area, memCxtState[MyProcNumber].memstats_dsa_pointer);
+ memCxtState[MyProcNumber].memstats_dsa_pointer = InvalidDsaPointer;
+}
+
+/*
+ * Free the memory context statistics stored by this process
+ * in DSA area.
+ */
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+
+ if (memCxtArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
+ return;
+
+ LWLockAcquire(&memCxtState[idx].lw_lock, LW_EXCLUSIVE);
+
+ if (!DsaPointerIsValid(memCxtState[idx].memstats_dsa_pointer))
+ {
+ LWLockRelease(&memCxtState[idx].lw_lock);
+ return;
+ }
+
+ /* If the dsa mapping could not be found, attach to the area */
+ if (area == NULL)
+ area = dsa_attach(memCxtArea->memstats_dsa_handle);
+
+ /*
+ * Free the memory context statistics, free the name, ident and path
+ * pointers before freeing the pointer that contains these pointers and
+ * integer statistics.
+ */
+ free_memorycontextstate_dsa(area, memCxtState[idx].total_stats,
+ memCxtState[idx].memstats_dsa_pointer);
+
+ dsa_detach(area);
+ LWLockRelease(&memCxtState[idx].lw_lock);
+}
+
void *
palloc(Size size)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 37a484147a8..4708f55be18 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8571,6 +8571,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0d8528b2875..58b2496a9cb 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 4df1d25c045..d333f338ebb 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -219,6 +219,8 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
LWTRANCHE_AIO_URING_COMPLETION,
+ LWTRANCHE_MEMORY_CONTEXT_REPORTING_STATE,
+ LWTRANCHE_MEMORY_CONTEXT_REPORTING_PROC,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 016dfd9b3f6..cfe14631445 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..d328270fafc 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,6 +18,9 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
/*
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry) + \
+ (100 * sizeof(int)) + (2 * MEMORY_CONTEXT_IDENT_SHMEM_SIZE))
/*
* Standard top-level memory contexts.
@@ -319,4 +339,66 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ dsa_pointer name;
+ dsa_pointer ident;
+ dsa_pointer path;
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Static shared memory state representing the DSA area created for memory
+ * context statistics reporting. A single DSA area is created and used by all
+ * the processes, each having its specific DSA allocations for sharing memory
+ * statistics, tracked by per backend static shared memory state.
+ */
+typedef struct MemoryStatsCtl
+{
+ dsa_handle memstats_dsa_handle;
+ LWLock lw_lock;
+} MemoryStatsCtl;
+
+/*
+ * Per backend static shared memory state for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsBackendState
+{
+ ConditionVariable memcxt_cv;
+ LWLock lw_lock;
+ int proc_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryStatsBackendState;
+
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+extern PGDLLIMPORT MemoryStatsBackendState *memCxtState;
+extern PGDLLIMPORT MemoryStatsCtl *memCxtArea;
+extern void ProcessGetMemoryContextInterrupt(void);
+extern const char *ContextTypeToString(NodeTag type);
+extern void HandleGetMemoryContextInterrupt(void);
+extern Size MemoryContextReportingShmemSize(void);
+extern void MemoryContextReportingShmemInit(void);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
+extern dsa_area *area;
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 87e6da8d25e..780e4c4fc07 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1671,6 +1671,10 @@ MemoryContextCounters
MemoryContextData
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsBackendState
+MemoryStatsContextId
+MemoryStatsCtl
+MemoryStatsEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.39.3 (Apple Git-146)
On 8 Apr 2025, at 10:03, Daniel Gustafsson <daniel@yesql.se> wrote:
There was a bug in the shmem init function which caused it to fail on Windows,
the attached fixes that.
With this building green in CI over several re-builds, and another pass over
the docs and code with pgindent etc done, I pushed this earlier today. A few
BF animals have built green so far but I will continue to monitor it.
--
Daniel Gustafsson
On 2025/04/08 18:46, Daniel Gustafsson wrote:
On 8 Apr 2025, at 10:03, Daniel Gustafsson <daniel@yesql.se> wrote:
There was a bug in the shmem init function which caused it to fail on Windows,
the attached fixes that.With this building green in CI over several re-builds, and another pass over
the docs and code with pgindent etc done, I pushed this earlier today. A few
BF animals have built green so far but I will continue to monitor it.
Thanks for committing this feature!
I noticed that the third argument of pg_get_process_memory_contexts() is named
"retries" in pg_proc.dat, while the documentation refers to it as "timeout".
Since "retries" is misleading, how about renaming it to "timeout" in pg_proc.dat?
Patch attached.
Also, as I mentioned earlier, I encountered an issue when calling
pg_get_process_memory_contexts() on the PID of a backend that had just
encountered an error but hadn't finished rolling back. It led to
the following situation:
Session 1 (PID=70011):
=# begin;
=# select 1/0;
ERROR: division by zero
Session 2:
=# select * from pg_get_process_memory_contexts(70011, false, 10);
Session 1 terminated with:
ERROR: ResourceOwnerEnlarge called after release started
FATAL: terminating connection because protocol synchronization was lost
Shouldn't this be addressed?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
v1-0001-Rename-misleading-argument-in-pg_get_process_memo.patchtext/plain; charset=UTF-8; name=v1-0001-Rename-misleading-argument-in-pg_get_process_memo.patchDownload
From a79084fe3caaf791f2aa8d466603c085ccb8c5af Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Wed, 9 Apr 2025 01:27:48 +0900
Subject: [PATCH v1] Rename misleading argument in
pg_get_process_memory_contexts().
Previously, the third argument of pg_get_process_memory_contexts()
was named retries in pg_proc.dat, even though it actually specifies
a timeout value in seconds. This name was misleading to users and
inconsistent with the documentation, which correctly referred to it
as timeout.
This commit renames the argument to timeout in pg_proc.dat to
improve clarity and maintain consistency with the documentation.
---
src/include/catalog/pg_proc.dat | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4708f55be18..62beb71da28 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8578,7 +8578,7 @@
prorettype => 'record', proargtypes => 'int4 bool float8',
proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
+ proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}',
prosrc => 'pg_get_process_memory_contexts' },
# non-persistent series generator
--
2.49.0
On 8 Apr 2025, at 18:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2025/04/08 18:46, Daniel Gustafsson wrote:On 8 Apr 2025, at 10:03, Daniel Gustafsson <daniel@yesql.se> wrote:
There was a bug in the shmem init function which caused it to fail on Windows,
the attached fixes that.With this building green in CI over several re-builds, and another pass over
the docs and code with pgindent etc done, I pushed this earlier today. A few
BF animals have built green so far but I will continue to monitor it.Thanks for committing this feature!
I noticed that the third argument of pg_get_process_memory_contexts() is named
"retries" in pg_proc.dat, while the documentation refers to it as "timeout".
Since "retries" is misleading, how about renaming it to "timeout" in pg_proc.dat?
Patch attached.
Ugh, that's my bad. It was changed from using retries to a timeout and I
missed that.
Also, as I mentioned earlier, I encountered an issue when calling
pg_get_process_memory_contexts() on the PID of a backend that had just
encountered an error but hadn't finished rolling back. It led to
the following situation:Session 1 (PID=70011):
=# begin;
=# select 1/0;
ERROR: division by zeroSession 2:
=# select * from pg_get_process_memory_contexts(70011, false, 10);Session 1 terminated with:
ERROR: ResourceOwnerEnlarge called after release started
FATAL: terminating connection because protocol synchronization was lostShouldn't this be addressed?
Sorry, this must've been missed in this fairly lon thread, will have a look at
it tonight.
--
Daniel Gustafsson
On 8 Apr 2025, at 18:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I noticed that the third argument of pg_get_process_memory_contexts() is named
"retries" in pg_proc.dat, while the documentation refers to it as "timeout".
I've committed this patch as it was obviously correct, thanks!
Also, as I mentioned earlier, I encountered an issue when calling
pg_get_process_memory_contexts() on the PID of a backend that had just
encountered an error but hadn't finished rolling back. It led to
the following situation:
I reconfirmed that the bugfix that Rahila shared in [0]CAH2L28shr0j3JE5V3CXDFmDH-agTSnh2V8pR23X0UhRMbDQD9Q@mail.gmail.com fixes this issue (and
will fix others like it, as it's not related to this patch in particular but is
a bug in DSM attaching). My plan is to take that for a more thorough review
and test tomorrow and see how far it can be safely backpatched. Thanks for
bringing this up, sorry about it getting a bit lost among all the emails.
--
Daniel Gustafsson
[0]: CAH2L28shr0j3JE5V3CXDFmDH-agTSnh2V8pR23X0UhRMbDQD9Q@mail.gmail.com
On 2025/04/09 6:27, Daniel Gustafsson wrote:
On 8 Apr 2025, at 18:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I noticed that the third argument of pg_get_process_memory_contexts() is named
"retries" in pg_proc.dat, while the documentation refers to it as "timeout".I've committed this patch as it was obviously correct, thanks!
Thanks a lot!
Since pg_proc.dat was modified, do we need to bump the catalog version?
Also, as I mentioned earlier, I encountered an issue when calling
pg_get_process_memory_contexts() on the PID of a backend that had just
encountered an error but hadn't finished rolling back. It led to
the following situation:I reconfirmed that the bugfix that Rahila shared in [0] fixes this issue (and
will fix others like it, as it's not related to this patch in particular but is
a bug in DSM attaching). My plan is to take that for a more thorough review
and test tomorrow and see how far it can be safely backpatched. Thanks for
bringing this up, sorry about it getting a bit lost among all the emails.
Appreciate your work on this!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi,
Please find attached a patch with some comments and documentation changes.
Additionaly, added a missing '\0' termination to "Remaining Totals" string.
I think this became necessary after we replaced dsa_allocate0()
with dsa_allocate() is the latest version.
Thank you,
Rahila Syed
Attachments:
0001-Fix-typos-and-modify-few-comments.patchapplication/octet-stream; name=0001-Fix-typos-and-modify-few-comments.patchDownload
From 9f1c04c156a65f31c9036d242295bd3e11c00e98 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Tue, 29 Apr 2025 14:20:32 +0530
Subject: [PATCH] Fix typos and modify few comments. Add a missing null
termination.
---
doc/src/sgml/func.sgml | 5 ++---
src/backend/utils/mmgr/mcxt.c | 11 +++++++----
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 574a544d9fa..af3d056b992 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28780,8 +28780,7 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
indicate the number aggregated child contexts. When
<parameter>summary</parameter> is <literal>false</literal>,
<literal>the num_agg_contexts</literal> value is <literal>1</literal>,
- indicating that individual statistics are being displayed. The levels
- are limited to the first 100 contexts.
+ indicating that individual statistics are being displayed.
</para>
<para>
Busy processes can delay reporting memory context statistics,
@@ -28796,7 +28795,7 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
statistics are aggregated and a cumulative total is displayed. The
<literal>num_agg_contexts</literal> column indicates the number of
contexts aggregated in the displayed statistics. When
- <literal>num_agg_contexts</literal> is <literal>1</literal> is means
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
that the context statistics are displayed separately.
</para></entry>
</row>
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 506f2902986..63c53c1552a 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1494,7 +1494,7 @@ ProcessGetMemoryContextInterrupt(void)
/*
* Create a DSA and send handle to the client process after storing the
* context statistics. If number of contexts exceed a predefined
- * limit(8MB), a cumulative total is stored for such contexts.
+ * limit (1MB), a cumulative total is stored for such contexts.
*/
if (memCxtArea->memstats_dsa_handle == DSA_HANDLE_INVALID)
{
@@ -1512,8 +1512,10 @@ ProcessGetMemoryContextInterrupt(void)
/*
* Pin the DSA area, this is to make sure the area remains attachable
- * even if current backend exits. This is done so that the statistics
- * are published even if the process exits while a client is waiting.
+ * even if the backend that created it exits. This is done so that
+ * the statistics are published even if the process exits while a
+ * client is waiting. Also, other processes that publish statistics
+ * will use the same area.
*/
dsa_pin(MemoryStatsDsaArea);
@@ -1609,9 +1611,9 @@ ProcessGetMemoryContextInterrupt(void)
}
memCxtState[idx].total_stats = cxt_id;
+ /* Notify waiting backends and return */
end_memorycontext_reporting();
- /* Notify waiting backends and return */
hash_destroy(context_id_lookup);
return;
@@ -1663,6 +1665,7 @@ ProcessGetMemoryContextInterrupt(void)
meminfo[max_stats - 1].name = dsa_allocate(MemoryStatsDsaArea, namelen + 1);
nameptr = dsa_get_address(MemoryStatsDsaArea, meminfo[max_stats - 1].name);
strncpy(nameptr, "Remaining Totals", namelen);
+ nameptr[namelen] = '\0';
meminfo[max_stats - 1].ident = InvalidDsaPointer;
meminfo[max_stats - 1].path = InvalidDsaPointer;
meminfo[max_stats - 1].type = 0;
--
2.34.1
On 29.04.25 15:13, Rahila Syed wrote:
Please find attached a patch with some comments and documentation changes.
Additionaly, added a missing '\0' termination to "Remaining Totals" string.
I think this became necessary after we replaced dsa_allocate0()
with dsa_allocate() is the latest version.
strncpy(nameptr, "Remaining Totals", namelen);
+ nameptr[namelen] = '\0';
Looks like a case for strlcpy()?
On 30 Apr 2025, at 12:14, Peter Eisentraut <peter@eisentraut.org> wrote:
On 29.04.25 15:13, Rahila Syed wrote:
Please find attached a patch with some comments and documentation changes.
Additionaly, added a missing '\0' termination to "Remaining Totals" string.
I think this became necessary after we replaced dsa_allocate0()
with dsa_allocate() is the latest version.strncpy(nameptr, "Remaining Totals", namelen);
+ nameptr[namelen] = '\0';Looks like a case for strlcpy()?
True. I did go ahead with the strncpy and nul terminator assignment, mostly
out of muscle memory, but I agree that this would be a good place for a
strlcpy() instead.
--
Daniel Gustafsson
Hi,
Please find attached the latest memory context statistics monitoring patch.
It has been redesigned to address several issues highlighted in the thread
[1]: . PostgreSQL: Re: pgsql: Add function to get memory context stats for processes </messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com>
and [2]. PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler. </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se.
Here are some key highlights of the new design:
- All DSA processing has been moved out of the CFI handler function. Now,
all the dynamic shared memory
needed to store the statistics is created and deleted in the client
function. This change addresses concerns
that DSA APIs are too high level to be safely called from interrupt
handlers. There was also a concern that
DSA API calls might not provide re-entrancy, which could cause issues if
CFI is invoked from a DSA function
in the future.
- The static shared memory array has been replaced with a DSHASH table
which now holds metadata such as
pointers to actual statistics for each process.
- dsm_registry.c APIs are used for creating and attaching to DSA and
DSHASH table, which helps prevent code
duplication.
-To address the memory leak concern, we create an exclusive memory context
under the NULL context, which
does not fall under the TopMemoryContext tree, to handle all the memory
allocations in ProcessGetMemoryContextInterrupt.
This ensures the memory context created by the function does not affect its
outcome.
The memory context is reset at the end of the function, which helps prevent
any memory leaks.
- Changes made to the mcxt.c file have been relocated to mcxtfuncs.c, which
now contains all the existing
memory statistics-related functions along with the code for the proposed
function.
The overall flow of a request is as follows:
1. A client backend running the pg_get_process_memory_contexts function
creates a DSA and allocates memory
to store statistics, tracked by DSA pointer. This pointer is stored in a
DSHASH entry for each client querying the
statistics of any process.
The client shares its DSHASH table key with the server process using a
static shared array of keys indexed
by the server's procNumber. It notifies the server process to publish
statistics by using SendProcSignal.
2. When a PostgreSQL server process handles the request for memory
statistics, the CFI function accesses the
client hash key stored in its procNumber slot of the shared keys array. The
server process then retrieves the
DSHASH entry to obtain the DSA pointer allocated by the client, for storing
the statistics.
After storing the statistics, it notifies the client through its condition
variable.
3. Although the DSA is created just once, the memory inside the DSA is
allocated and released by the client
process as soon as it finishes reading the statistics.
If it fails to do so, it is deleted by the before_shmem_exit callback when
the client exits. The client's entry in DSHASH
table is also deleted when the client exits.
4. The DSA and DSHASH table are not created
until pg_get_process_memory_context function is called.
Once created, any client backend querying statistics and any PostgreSQL
process publishing statistics will
attach to the same area and table.
Please let me know your thoughts.
Thank you,
Rahila Syed
[1]: . PostgreSQL: Re: pgsql: Add function to get memory context stats for processes </messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com>
processes
</messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com>
[2]: . PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler. </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se
an interrupt handler.
</messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se
On Wed, Apr 30, 2025 at 4:13 PM Daniel Gustafsson <daniel@yesql.se> wrote:
Show quoted text
On 30 Apr 2025, at 12:14, Peter Eisentraut <peter@eisentraut.org> wrote:
On 29.04.25 15:13, Rahila Syed wrote:
Please find attached a patch with some comments and documentation
changes.
Additionaly, added a missing '\0' termination to "Remaining Totals"
string.
I think this became necessary after we replaced dsa_allocate0()
with dsa_allocate() is the latest version.strncpy(nameptr, "Remaining Totals", namelen);
+ nameptr[namelen] = '\0';Looks like a case for strlcpy()?
True. I did go ahead with the strncpy and nul terminator assignment,
mostly
out of muscle memory, but I agree that this would be a good place for a
strlcpy() instead.--
Daniel Gustafsson
Attachments:
v30-0001-Add-pg_get_process_memory_context-function.patchapplication/octet-stream; name=v30-0001-Add-pg_get_process_memory_context-function.patchDownload
From c4ea611583dd36c1a3facf7d3185c1e9e93b17b2 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 30 Jun 2025 12:11:00 +0530
Subject: [PATCH] Add pg_get_process_memory_context function
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned.
---
doc/src/sgml/func.sgml | 164 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/lwlock.c | 1 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 1 +
src/backend/utils/adt/mcxtfuncs.c | 837 +++++++++++++++++-
src/backend/utils/adt/pg_locale.c | 1 -
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mb/mbutils.c | 1 -
src/backend/utils/mmgr/mcxt.c | 71 +-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlock.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 92 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
28 files changed, 1227 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c28aa71f570..ba082030ea2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28716,6 +28716,137 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28855,6 +28986,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b2d5332effc..33b5fcb9119 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -682,6 +682,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 9474095f271..5e7e8081c05 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index fda91ffd1ce..d3cb3f1891c 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -663,6 +663,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..ac97a39447c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 777c9a8d555..5d14684f6b2 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..fe3d32e40b0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index a9bb540b55a..ce69e26d720 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 46f44bc4511..a7b5ede2b12 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -178,6 +178,7 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_XACT_SLRU] = "XactSLRU",
[LWTRANCHE_PARALLEL_VACUUM_DSA] = "ParallelVacuumDSA",
[LWTRANCHE_AIO_URING_COMPLETION] = "AioUringCompletion",
+ [LWTRANCHE_MEMORY_CONTEXT_KEYS] = "MemoryContextReportingKeys",
};
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2f8c3d5f918..83db8a20efb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3533,6 +3533,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4da68312b5f..78b1fa5ca43 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..b44bcb18118 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,13 +15,38 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void dsa_cleanup(MemoryStatsDSHashEntry *entry);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, int max_levels);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -89,7 +114,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +168,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +183,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +229,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +256,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +264,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +345,754 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
+ * time left within the timeout specified by the user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit(1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ ereport(WARNING,
+ errmsg("server process is processing previous request %d: %m", pid));
+ LWLockRelease(client_keys_lock);
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ ConditionVariableInit(&entry->memcxt_cv);
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The meomry is freed at the end of the
+ * function.
+ */
+ Assert(!DsaPointerIsValid(entry->memstats_dsa_pointer));
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+ entry->summary = summary;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ while (1)
+ {
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the correct
+ * entry in the proc_id field.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ *
+ * Note in procnumber.h file says that a procNumber can be re-used for
+ * a different backend immediately after a backend exits. In case an
+ * old process' data was there and not updated by the current process
+ * in the slot identified by the procNumber, the pid of the requested
+ * process and the proc_id might not match.
+ *
+ */
+ if (entry->proc_id == pid)
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+
+
+ /*
+ * Wait for the timeout as defined by the user. If no statistics are
+ * available within the allowed time then return NULL. The timer is
+ * defined in milliseconds since that's what the condition variable
+ * sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (timeout * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (memcxt_info[i].name[0] != '\0')
+ {
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+ }
+ else
+ nulls[0] = true;
+
+ if (memcxt_info[i].ident[0] != '\0')
+ {
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dsa_cleanup(entry);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+dsa_cleanup(MemoryStatsDSHashEntry *entry)
+{
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->proc_id = 0;
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Statistics written by each process are tracked independently in
+ * per-process DSA pointers. These pointers are stored in static shared memory.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility maximum size of statistics for each context. The remaining context
+ * statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ /* Retreive the client key fo publishing statistics */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ LWLockRelease(client_keys_lock);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ summary = entry->summary;
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ entry->proc_id = MyProcPid;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, 100);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsInternal(c, 1, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, 100);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, 100);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+
+ /*
+ * Empty this processes slot, so other clients can request memory
+ * statistics
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ab768a7a91f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -45,7 +45,6 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
-#include "utils/relcache.h"
#include "utils/syscache.h"
#ifdef WIN32
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c86ceefda94..89d72cdd5ff 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -663,6 +663,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 886ecbad871..308016d7763 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,7 +39,6 @@
#include "mb/pg_wchar.h"
#include "utils/fmgrprotos.h"
#include "utils/memutils.h"
-#include "utils/relcache.h"
#include "varatt.h"
/*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 15fa4d0a55e..d6b081d1bbf 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -159,10 +160,6 @@ MemoryContext PortalContext = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
-static void MemoryContextStatsInternal(MemoryContext context, int level,
- int max_level, int max_children,
- MemoryContextCounters *totals,
- bool print_to_stderr);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -831,11 +828,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 1, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -870,13 +875,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_TO_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
-static void
+void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -884,10 +890,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -907,7 +942,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -926,7 +961,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i < level; i++)
fprintf(stderr, " ");
@@ -939,7 +980,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1fc19146f46..4bc4474b580 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8597,6 +8597,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bef98471c3..1e59a7f910f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 08a72569ae5..638407adf39 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -221,6 +221,7 @@ typedef enum BuiltinTrancheIds
LWTRANCHE_XACT_SLRU,
LWTRANCHE_PARALLEL_VACUUM_DSA,
LWTRANCHE_AIO_URING_COMPLETION,
+ LWTRANCHE_MEMORY_CONTEXT_KEYS,
LWTRANCHE_FIRST_USER_DEFINED,
} BuiltinTrancheIds;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..01835d56021 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -319,4 +339,74 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ int proc_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsInternal(MemoryContext context, int level,
+ int max_level, int max_children,
+ MemoryContextCounters *totals,
+ PrintDestination print_location,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 83192038571..fedae342032 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1683,6 +1683,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
Hi,
Please find attached an updated patch. It contains the following changes.
1. It needed a rebase as highlighted by cfbot
<https://cfbot.cputube.org/patch_5938.log>. The method for adding an
LWLock was updated in commit-2047ad068139f0b8c6da73d0b845ca9ba30fb33d, so
the patch has been adjusted to reflect this change.
2. Updated some comments to align with the latest patch design.
3. Eliminated an unnecessary assertion
Thank you,
Rahila Syed
On Fri, Jul 11, 2025 at 9:01 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Show quoted text
Hi,
Please find attached the latest memory context statistics monitoring
patch.
It has been redesigned to address several issues highlighted in the thread
[1]
and [2].Here are some key highlights of the new design:
- All DSA processing has been moved out of the CFI handler function. Now,
all the dynamic shared memory
needed to store the statistics is created and deleted in the client
function. This change addresses concerns
that DSA APIs are too high level to be safely called from interrupt
handlers. There was also a concern that
DSA API calls might not provide re-entrancy, which could cause issues if
CFI is invoked from a DSA function
in the future.- The static shared memory array has been replaced with a DSHASH table
which now holds metadata such as
pointers to actual statistics for each process.- dsm_registry.c APIs are used for creating and attaching to DSA and
DSHASH table, which helps prevent code
duplication.-To address the memory leak concern, we create an exclusive memory context
under the NULL context, which
does not fall under the TopMemoryContext tree, to handle all the memory
allocations in ProcessGetMemoryContextInterrupt.
This ensures the memory context created by the function does not affect
its outcome.
The memory context is reset at the end of the function, which helps
prevent any memory leaks.- Changes made to the mcxt.c file have been relocated to mcxtfuncs.c,
which now contains all the existing
memory statistics-related functions along with the code for the proposed
function.The overall flow of a request is as follows:
1. A client backend running the pg_get_process_memory_contexts function
creates a DSA and allocates memory
to store statistics, tracked by DSA pointer. This pointer is stored in a
DSHASH entry for each client querying the
statistics of any process.
The client shares its DSHASH table key with the server process using a
static shared array of keys indexed
by the server's procNumber. It notifies the server process to publish
statistics by using SendProcSignal.2. When a PostgreSQL server process handles the request for memory
statistics, the CFI function accesses the
client hash key stored in its procNumber slot of the shared keys array.
The server process then retrieves the
DSHASH entry to obtain the DSA pointer allocated by the client, for
storing the statistics.
After storing the statistics, it notifies the client through its
condition variable.3. Although the DSA is created just once, the memory inside the DSA is
allocated and released by the client
process as soon as it finishes reading the statistics.
If it fails to do so, it is deleted by the before_shmem_exit callback when
the client exits. The client's entry in DSHASH
table is also deleted when the client exits.4. The DSA and DSHASH table are not created
until pg_get_process_memory_context function is called.
Once created, any client backend querying statistics and any PostgreSQL
process publishing statistics will
attach to the same area and table.Please let me know your thoughts.
Thank you,
Rahila Syed[1]. PostgreSQL: Re: pgsql: Add function to get memory context stats for
processes
</messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com>
[2]. PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA
from an interrupt handler.
</messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.seOn Wed, Apr 30, 2025 at 4:13 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 30 Apr 2025, at 12:14, Peter Eisentraut <peter@eisentraut.org>
wrote:
On 29.04.25 15:13, Rahila Syed wrote:
Please find attached a patch with some comments and documentation
changes.
Additionaly, added a missing '\0' termination to "Remaining Totals"
string.
I think this became necessary after we replaced dsa_allocate0()
with dsa_allocate() is the latest version.strncpy(nameptr, "Remaining Totals", namelen);
+ nameptr[namelen] = '\0';Looks like a case for strlcpy()?
True. I did go ahead with the strncpy and nul terminator assignment,
mostly
out of muscle memory, but I agree that this would be a good place for a
strlcpy() instead.--
Daniel Gustafsson
Attachments:
v31-0001-Add-pg_get_process_memory_context-function.patchapplication/octet-stream; name=v31-0001-Add-pg_get_process_memory_context-function.patchDownload
From 9487a3d5d673bb0ec5521ad8bdb1e6cb603a51a6 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 30 Jun 2025 12:11:00 +0530
Subject: [PATCH] Add pg_get_process_memory_context function
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned.
---
doc/src/sgml/func.sgml | 164 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 831 +++++++++++++++++-
src/backend/utils/adt/pg_locale.c | 1 -
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mb/mbutils.c | 1 -
src/backend/utils/mmgr/mcxt.c | 71 +-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 92 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 1221 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..2cde766072d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28719,6 +28719,137 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -28858,6 +28989,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f6eca09ee15..49740fadb6d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -682,6 +682,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 9474095f271..5e7e8081c05 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -781,6 +781,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 8490148a47d..767fc6a0014 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..ac97a39447c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 777c9a8d555..5d14684f6b2 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..fe3d32e40b0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index a9bb540b55a..ce69e26d720 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a297606cdd7..c8cac8811ee 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3534,6 +3534,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..54f91a76a1b 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -406,6 +407,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..32a03f12b1a 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,13 +15,38 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, int max_levels);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -89,7 +114,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +168,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +183,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +229,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +256,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +264,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +345,748 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
+ * time left within the timeout specified by the user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit(1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ ereport(WARNING,
+ errmsg("server process is processing previous request %d: %m", pid));
+ LWLockRelease(client_keys_lock);
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ ConditionVariableInit(&entry->memcxt_cv);
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+ entry->summary = summary;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ while (1)
+ {
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the correct
+ * entry in the proc_id field.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ *
+ */
+ if (entry->proc_id == pid)
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ memstats_dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+
+
+ /*
+ * Wait for the timeout as defined by the user. If no statistics are
+ * available within the allowed time then return NULL. The timer is
+ * defined in milliseconds since that's what the condition variable
+ * sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (timeout * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (memcxt_info[i].name[0] != '\0')
+ {
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+ }
+ else
+ nulls[0] = true;
+
+ if (memcxt_info[i].ident[0] != '\0')
+ {
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ memstats_dsa_cleanup(entry);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry)
+{
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->proc_id = 0;
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ /* Retreive the client key fo publishing statistics */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ LWLockRelease(client_keys_lock);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ summary = entry->summary;
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ entry->proc_id = MyProcPid;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, 100);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsInternal(c, 1, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, 100);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, 100);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+
+ /*
+ * Empty this processes slot, so other clients can request memory
+ * statistics
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ab768a7a91f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -45,7 +45,6 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
-#include "utils/relcache.h"
#include "utils/syscache.h"
#ifdef WIN32
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 641e535a73c..fb3f2d21fa0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -662,6 +662,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 886ecbad871..308016d7763 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,7 +39,6 @@
#include "mb/pg_wchar.h"
#include "utils/fmgrprotos.h"
#include "utils/memutils.h"
-#include "utils/relcache.h"
#include "varatt.h"
/*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index ce01dce9861..a6eda19edce 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -23,6 +23,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -159,10 +160,6 @@ MemoryContext PortalContext = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
-static void MemoryContextStatsInternal(MemoryContext context, int level,
- int max_level, int max_children,
- MemoryContextCounters *totals,
- bool print_to_stderr);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -864,11 +861,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 1, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -903,13 +908,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_TO_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
-static void
+void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -917,10 +923,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -940,7 +975,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -959,7 +994,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i < level; i++)
fprintf(stderr, " ");
@@ -972,7 +1013,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3ee8fed7e53..707c965ecf5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8597,6 +8597,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bef98471c3..1e59a7f910f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 208d2e3a8ed..72ace053e9d 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -135,3 +135,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..01835d56021 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -319,4 +339,74 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ int proc_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsInternal(MemoryContext context, int level,
+ int max_level, int max_children,
+ MemoryContextCounters *totals,
+ PrintDestination print_location,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3daba26b237..e614ae25a8c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1681,6 +1681,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
Hi,
CFbot indicated that the patch requires a rebase, so I've attached an
updated version.
The documentation for this feature is now included in the new
func-admin.sgml file,
due to recent changes in the documentation of sql functions.
The following are results from a performance test:
pgbench is initialized as follows :
pgbench -i -s 100 postgres
Test1 -
pgbench -c 16 -j 16 postgres -T 100
TPS: 745.02 (average of 3 runs)
Test2-
pgbench -c 16 -j 16 postgres -T 100
while memory usage of any postgres process is monitored concurrently every
0.1 seconds,
using the following method:
SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
ORDER BY random() LIMIT 1)
, false, 5);
TPS: 750.66 (average of 3 runs)
I have not observed any performance decline resulting from the concurrent
execution
of the memory monitoring function.
Thank you,
Rahila Syed
On Tue, Jul 29, 2025 at 7:10 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Show quoted text
Hi,
Please find attached an updated patch. It contains the following changes.
1. It needed a rebase as highlighted by cfbot
<https://cfbot.cputube.org/patch_5938.log>. The method for adding an
LWLock was updated in commit-2047ad068139f0b8c6da73d0b845ca9ba30fb33d, so
the patch has been adjusted to reflect this change.
2. Updated some comments to align with the latest patch design.
3. Eliminated an unnecessary assertionThank you,
Rahila SyedOn Fri, Jul 11, 2025 at 9:01 PM Rahila Syed <rahilasyed90@gmail.com>
wrote:Hi,
Please find attached the latest memory context statistics monitoring
patch.
It has been redesigned to address several issues highlighted in the
thread [1]
and [2].Here are some key highlights of the new design:
- All DSA processing has been moved out of the CFI handler function. Now,
all the dynamic shared memory
needed to store the statistics is created and deleted in the client
function. This change addresses concerns
that DSA APIs are too high level to be safely called from interrupt
handlers. There was also a concern that
DSA API calls might not provide re-entrancy, which could cause issues if
CFI is invoked from a DSA function
in the future.- The static shared memory array has been replaced with a DSHASH table
which now holds metadata such as
pointers to actual statistics for each process.- dsm_registry.c APIs are used for creating and attaching to DSA and
DSHASH table, which helps prevent code
duplication.-To address the memory leak concern, we create an exclusive memory
context under the NULL context, which
does not fall under the TopMemoryContext tree, to handle all the memory
allocations in ProcessGetMemoryContextInterrupt.
This ensures the memory context created by the function does not affect
its outcome.
The memory context is reset at the end of the function, which helps
prevent any memory leaks.- Changes made to the mcxt.c file have been relocated to mcxtfuncs.c,
which now contains all the existing
memory statistics-related functions along with the code for the proposed
function.The overall flow of a request is as follows:
1. A client backend running the pg_get_process_memory_contexts function
creates a DSA and allocates memory
to store statistics, tracked by DSA pointer. This pointer is stored in a
DSHASH entry for each client querying the
statistics of any process.
The client shares its DSHASH table key with the server process using a
static shared array of keys indexed
by the server's procNumber. It notifies the server process to publish
statistics by using SendProcSignal.2. When a PostgreSQL server process handles the request for memory
statistics, the CFI function accesses the
client hash key stored in its procNumber slot of the shared keys array.
The server process then retrieves the
DSHASH entry to obtain the DSA pointer allocated by the client, for
storing the statistics.
After storing the statistics, it notifies the client through its
condition variable.3. Although the DSA is created just once, the memory inside the DSA is
allocated and released by the client
process as soon as it finishes reading the statistics.
If it fails to do so, it is deleted by the before_shmem_exit callback
when the client exits. The client's entry in DSHASH
table is also deleted when the client exits.4. The DSA and DSHASH table are not created
until pg_get_process_memory_context function is called.
Once created, any client backend querying statistics and any PostgreSQL
process publishing statistics will
attach to the same area and table.Please let me know your thoughts.
Thank you,
Rahila Syed[1]. PostgreSQL: Re: pgsql: Add function to get memory context stats for
processes
</messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com>
[2]. PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA
from an interrupt handler.
</messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.seOn Wed, Apr 30, 2025 at 4:13 PM Daniel Gustafsson <daniel@yesql.se>
wrote:On 30 Apr 2025, at 12:14, Peter Eisentraut <peter@eisentraut.org>
wrote:
On 29.04.25 15:13, Rahila Syed wrote:
Please find attached a patch with some comments and documentation
changes.
Additionaly, added a missing '\0' termination to "Remaining Totals"
string.
I think this became necessary after we replaced dsa_allocate0()
with dsa_allocate() is the latest version.strncpy(nameptr, "Remaining Totals", namelen);
+ nameptr[namelen] = '\0';Looks like a case for strlcpy()?
True. I did go ahead with the strncpy and nul terminator assignment,
mostly
out of muscle memory, but I agree that this would be a good place for a
strlcpy() instead.--
Daniel Gustafsson
Attachments:
v32-0001-Add-pg_get_process_memory_context-function.patchapplication/octet-stream; name=v32-0001-Add-pg_get_process_memory_context-function.patchDownload
From 17873d9d45f4207a1c227b49801befbcdb49b93d Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 30 Jun 2025 12:11:00 +0530
Subject: [PATCH] Add pg_get_process_memory_context function
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned.
---
doc/src/sgml/func/func-admin.sgml | 164 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 831 +++++++++++++++++-
src/backend/utils/adt/pg_locale.c | 1 -
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mb/mbutils.c | 1 -
src/backend/utils/mmgr/mcxt.c | 71 +-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 92 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 1221 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 446fdfe56f4..689e9231a7e 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,137 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +433,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..022586d6a55 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -682,6 +682,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ff96b36d710..3875f76564d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..ac97a39447c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 777c9a8d555..5d14684f6b2 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..fe3d32e40b0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0cecd464902..3933b6db607 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3534,6 +3534,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..54f91a76a1b 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -406,6 +407,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..32a03f12b1a 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,13 +15,38 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, int max_levels);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -89,7 +114,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +168,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +183,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +229,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +256,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +264,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +345,748 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
+ * time left within the timeout specified by the user, before giving up and
+ * returning previously published statistics, if any. If no previous statistics
+ * exist, return NULL.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit(1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ ereport(WARNING,
+ errmsg("server process is processing previous request %d: %m", pid));
+ LWLockRelease(client_keys_lock);
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ ConditionVariableInit(&entry->memcxt_cv);
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+ entry->summary = summary;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+
+ while (1)
+ {
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using the correct
+ * entry in the proc_id field.
+ *
+ * Make sure that the information belongs to pid we requested
+ * information for, Otherwise loop back and wait for the server
+ * process to finish publishing statistics.
+ *
+ */
+ if (entry->proc_id == pid)
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ memstats_dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+
+
+ /*
+ * Wait for the timeout as defined by the user. If no statistics are
+ * available within the allowed time then return NULL. The timer is
+ * defined in milliseconds since that's what the condition variable
+ * sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (timeout * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (memcxt_info[i].name[0] != '\0')
+ {
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+ }
+ else
+ nulls[0] = true;
+
+ if (memcxt_info[i].ident[0] != '\0')
+ {
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ memstats_dsa_cleanup(entry);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry)
+{
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->proc_id = 0;
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ /* Retreive the client key fo publishing statistics */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ LWLockRelease(client_keys_lock);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ summary = entry->summary;
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ entry->proc_id = MyProcPid;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, 100);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsInternal(c, 1, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, 100);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, 100);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+
+ /*
+ * Empty this processes slot, so other clients can request memory
+ * statistics
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ab768a7a91f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -45,7 +45,6 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
-#include "utils/relcache.h"
#include "utils/syscache.h"
#ifdef WIN32
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 641e535a73c..fb3f2d21fa0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -662,6 +662,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 886ecbad871..308016d7763 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,7 +39,6 @@
#include "mb/pg_wchar.h"
#include "utils/fmgrprotos.h"
#include "utils/memutils.h"
-#include "utils/relcache.h"
#include "varatt.h"
/*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..3a5422c7273 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -40,6 +40,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -176,10 +177,6 @@ MemoryContext PortalContext = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
-static void MemoryContextStatsInternal(MemoryContext context, int level,
- int max_level, int max_children,
- MemoryContextCounters *totals,
- bool print_to_stderr);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -877,11 +874,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 1, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -916,13 +921,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_TO_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
-static void
+void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -930,10 +936,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -953,7 +988,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -972,7 +1007,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i < level; i++)
fprintf(stderr, " ");
@@ -985,7 +1026,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..3d6f42606fd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8597,6 +8597,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bef98471c3..1e59a7f910f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 208d2e3a8ed..72ace053e9d 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -135,3 +135,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..01835d56021 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -319,4 +339,74 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ int proc_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+ TimestampTz stats_timestamp;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsInternal(MemoryContext context, int level,
+ int max_level, int max_children,
+ MemoryContextCounters *totals,
+ PrintDestination print_location,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..d2399465df6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1680,6 +1680,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 2025-08-08 18:26, Rahila Syed wrote:
Hi, thanks for working on this again.
Hi,
CFbot indicated that the patch requires a rebase, so I've attached an
updated version.
Here are some comments and questions for v32 patch:
--- a/doc/src/sgml/func/func-admin.sgml +++ b/doc/src/sgml/func/func-admin.sgml @@ -251,6 +251,137 @@ <literal>false</literal> is returned. </para></entry> </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_get_process_memory_contexts</primary>
This function is added at the end of Table "9.96. Server Signaling
Functions", but since pg_get_process_memory_contexts outputs essentially
the same information as pg_log_backend_memory_contexts, it might be
better to place them next to each other in the table.
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+typedef struct MemoryStatsDSHashEntry +{ + char key[64]; + ConditionVariable memcxt_cv; + int proc_id; + int total_stats; + bool summary; + dsa_pointer memstats_dsa_pointer; + TimestampTz stats_timestamp; +} MemoryStatsDSHashEntry;
stats_timestamp appears only in the two places below in the patch, but
it does not seem to be actually output.
Is this column unnecessary?
=# select * from pg_get_process_memory_contexts(pg_backend_pid(),
true, 10);
-[ RECORD 1 ]----+-----------------------------
name | TopMemoryContext
ident | [NULL]
type | AllocSet
path | {1}
level | 1
total_bytes | 222400
total_nblocks | 8
free_bytes | 4776
free_chunks | 8
used_bytes | 217624
num_agg_contexts | 1
Specifying 0 for timeout causes a crash:
=# select * from pg_get_process_memory_contexts(74526, true, 0);
(0 rows)
=# select 1;
WARNING: terminating connection because of crash of another server
process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Should 0 be handled safely and treated as “no timeout”, or rejected as
an error?
Similarly, specifying a negative value for timeout still works:
=# select * from pg_get_process_memory_contexts(30590, true, -10);
It might be better to reject negative values similar to
pg_terminate_backend().
context_id_lookup =
hash_create("pg_get_remote_backend_memory_contexts",
+ /* Retreive the client key fo publishing statistics */
fo -> for?
+ * If the publishing backend does not respond before the condition variable + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is + * time left within the timeout specified by the user, before giving up and + * returning previously published statistics, if any. If no previous statistics + * exist, return NULL. + */ +#define MEMSTATS_WAIT_TIMEOUT 100
MEMSTATS_WAIT_TIMEOUT is defined, but it doesn’t seem to be used.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Hi Torikoshia,
Thank you for reviewing the patch.
This function is added at the end of Table "9.96. Server Signaling
Functions", but since pg_get_process_memory_contexts outputs essentially
the same information as pg_log_backend_memory_contexts, it might be
better to place them next to each other in the table.
The idea was to place the new addition at the end of the table instead of
in the middle.
I’m fine with putting them together, though. I’ll do that in the next
version unless there’s a
reason not to.
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+typedef struct MemoryStatsDSHashEntry +{ + char key[64]; + ConditionVariable memcxt_cv; + int proc_id; + int total_stats; + bool summary; + dsa_pointer memstats_dsa_pointer; + TimestampTz stats_timestamp; +} MemoryStatsDSHashEntry;stats_timestamp appears only in the two places below in the patch, but
it does not seem to be actually output.
Is this column unnecessary?
Thank you for pointing this out. This is removed in the attached patch, as
it was a
remnant from the previous design. As old statistics are discarded in the
new design,
a timestamp field is not needed anymore.
Specifying 0 for timeout causes a crash:
Should 0 be handled safely and treated as “no timeout”, or rejected as
an error?
Good catch.
The crash has been resolved in the attached patch. It was caused by a
missing
ConditionVariableCancelSleep() call when exiting without statistics due to
a timeout value of 0.
A 0 timeout means that statistics should only be retrieved if they are
immediately available,
without waiting. We could exit with a warning/error saying "too low
timeout", but I think it's worthwhile
to try fetching the statistics if possible.
Similarly, specifying a negative value for timeout still works:
=# select * from pg_get_process_memory_contexts(30590, true, -10);
It might be better to reject negative values similar to
pg_terminate_backend().
Fixed as suggested by you in the attached patch.
Currently, negative values are interpreted as an indefinite wait for
statistics.
This could cause the client to hang if the server process exits without
providing statistics.
To avoid this, it would be better to exit after displaying a warning when
the user specifies
negative timeouts.
+ /* Retreive the client key fo publishing statistics */
fo -> for?
Fixed.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100
MEMSTATS_WAIT_TIMEOUT is defined, but it doesn’t seem to be used.
This is removed now as it was a leftover from the previous design.
The attached patch also fixes an assertion failure I observed when a client
times out
before the last requested process can publish its statistics. A client
frees the memory
reserved for storing the statistics when it exits the function after
timeout. Since a
server process was notified, it might attempt to read the same client entry
and access the dsa
memory reserved for statistics resulting in the assertion
failure. I resolved this by including a check for this scenario and then
exiting the handler
function accordingly.
Thank you,
Rahila Syed
Attachments:
v33-0001-Add-pg_get_process_memory_context-function.patchapplication/octet-stream; name=v33-0001-Add-pg_get_process_memory_context-function.patchDownload
From f37e942fa7732795dc37ef5ef3759622f1b71eef Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 30 Jun 2025 12:11:00 +0530
Subject: [PATCH] Add pg_get_process_memory_context function
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned.
---
doc/src/sgml/func/func-admin.sgml | 164 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 887 +++++++++++++++++-
src/backend/utils/adt/pg_locale.c | 1 -
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mb/mbutils.c | 1 -
src/backend/utils/mmgr/mcxt.c | 71 +-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 92 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 1277 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 446fdfe56f4..689e9231a7e 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,137 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +433,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..022586d6a55 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -682,6 +682,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ff96b36d710..3875f76564d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..ac97a39447c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 777c9a8d555..5d14684f6b2 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..fe3d32e40b0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0cecd464902..3933b6db607 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3534,6 +3534,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..54f91a76a1b 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -406,6 +407,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..ce507be3e85 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,13 +15,38 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, int max_levels);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -89,7 +114,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +168,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +183,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +229,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +256,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +264,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +345,804 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to timeout value specified by the user, give up and
+ * return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+
+ if (timeout < 0)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("\"timeout\" must not be negative")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit(1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request", pid));
+ LWLockRelease(client_keys_lock);
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from.
+ * If this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since the
+ * previous server was notified, it might attempt to read the same client entry
+ * and respond incorrectly with its statistics. By storing the server ID in the
+ * client entry, we prevent any previously signalled server process from writing
+ * its statistics in the space meant for the newly requested process.
+ */
+ entry->server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ PG_TRY();
+ {
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ memstats_dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+
+ while (1)
+ {
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ */
+ if (entry->stats_written)
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Recheck the state of the backend before sleeping on the
+ * condition variable to ensure the process is still alive. Only
+ * check the relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ memstats_dsa_cleanup(entry);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+
+ /*
+ * Wait for the timeout as defined by the user. If no statistics
+ * are available within the allowed time then return NULL. The
+ * timer is defined in milliseconds since that's what the
+ * condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (timeout * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(entry);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (memcxt_info[i].name[0] != '\0')
+ {
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+ }
+ else
+ nulls[0] = true;
+
+ if (memcxt_info[i].ident[0] != '\0')
+ {
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ memstats_dsa_cleanup(entry);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ ConditionVariableCancelSleep();
+
+ }
+ PG_CATCH();
+ {
+ memstats_dsa_cleanup(entry);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ ConditionVariableCancelSleep();
+ }
+ PG_END_TRY();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry)
+{
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->server_id = 0;
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ /* Retreive the client key for publishing statistics */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ LWLockRelease(client_keys_lock);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /* Entry has been deleted due to client process exit */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /* The client has timed out waiting for us to write statistics */
+ if (entry->server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ summary = entry->summary;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, 100);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsInternal(c, 1, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, 100);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ /* Notify waiting backends and return */
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, 100);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Empty this processes slot, so other clients can request memory
+ * statistics
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ab768a7a91f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -45,7 +45,6 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
-#include "utils/relcache.h"
#include "utils/syscache.h"
#ifdef WIN32
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 641e535a73c..fb3f2d21fa0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -662,6 +662,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 886ecbad871..308016d7763 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,7 +39,6 @@
#include "mb/pg_wchar.h"
#include "utils/fmgrprotos.h"
#include "utils/memutils.h"
-#include "utils/relcache.h"
#include "varatt.h"
/*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..3a5422c7273 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -40,6 +40,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -176,10 +177,6 @@ MemoryContext PortalContext = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
-static void MemoryContextStatsInternal(MemoryContext context, int level,
- int max_level, int max_children,
- MemoryContextCounters *totals,
- bool print_to_stderr);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -877,11 +874,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 1, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -916,13 +921,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_TO_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
-static void
+void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -930,10 +936,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -953,7 +988,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -972,7 +1007,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i < level; i++)
fprintf(stderr, " ");
@@ -985,7 +1026,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..3d6f42606fd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8597,6 +8597,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bef98471c3..1e59a7f910f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 208d2e3a8ed..72ace053e9d 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -135,3 +135,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..799cfe38cab 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -319,4 +339,74 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsInternal(MemoryContext context, int level,
+ int max_level, int max_children,
+ MemoryContextCounters *totals,
+ PrintDestination print_location,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..d2399465df6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1680,6 +1680,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 2025-08-14 07:35, Rahila Syed wrote:
Hi Torikoshia,
Thank you for reviewing the patch.
This function is added at the end of Table "9.96. Server Signaling
Functions", but since pg_get_process_memory_contexts outputs
essentially
the same information as pg_log_backend_memory_contexts, it might be
better to place them next to each other in the table.The idea was to place the new addition at the end of the table instead
of in the middle.
I’m fine with putting them together, though. I’ll do that in the
next version unless there’s a
reason not to.+ <parameter>stats_timestamp</parameter>
<type>timestamptz</type> )
+typedef struct MemoryStatsDSHashEntry +{ + char key[64]; + ConditionVariable memcxt_cv; + int proc_id; + int total_stats; + bool summary; + dsa_pointer memstats_dsa_pointer; + TimestampTz stats_timestamp; +} MemoryStatsDSHashEntry;stats_timestamp appears only in the two places below in the patch,
but
it does not seem to be actually output.
Is this column unnecessary?Thank you for pointing this out. This is removed in the attached
patch, as it was a
remnant from the previous design. As old statistics are discarded in
the new design,
a timestamp field is not needed anymore.Specifying 0 for timeout causes a crash:
Should 0 be handled safely and treated as “no timeout”, or
rejected as
an error?Good catch.
The crash has been resolved in the attached patch. It was caused by a
missing
ConditionVariableCancelSleep() call when exiting without statistics
due to a timeout value of 0.
A 0 timeout means that statistics should only be retrieved if they are
immediately available,
without waiting. We could exit with a warning/error saying "too low
timeout", but I think it's worthwhile
to try fetching the statistics if possible.Similarly, specifying a negative value for timeout still works:
=# select * from pg_get_process_memory_contexts(30590, true,
-10);It might be better to reject negative values similar to
pg_terminate_backend().Fixed as suggested by you in the attached patch.
Currently, negative values are interpreted as an indefinite wait for
statistics.
This could cause the client to hang if the server process exits
without providing statistics.
To avoid this, it would be better to exit after displaying a warning
when the user specifies
negative timeouts.+ /* Retreive the client key fo publishing statistics */
fo -> for?
Fixed.
+ */
+#define MEMSTATS_WAIT_TIMEOUT 100MEMSTATS_WAIT_TIMEOUT is defined, but it doesn’t seem to be used.
This is removed now as it was a leftover from the previous design.
The attached patch also fixes an assertion failure I observed when a
client times out
before the last requested process can publish its statistics. A client
frees the memory
reserved for storing the statistics when it exits the function after
timeout. Since a
server process was notified, it might attempt to read the same client
entry and access the dsa
memory reserved for statistics resulting in the assertion
failure. I resolved this by including a check for this scenario and
then exiting the handler
function accordingly.
Thanks for updating the patch!
However, when I ran pg_get_process_memory_contexts() with summary =
true, it took a while and returned nothing:
=# select pg_get_process_memory_contexts(pg_backend_pid(), true, 1)
from pg_stat_activity ;
pg_get_process_memory_contexts
--------------------------------
(0 rows)
Time: 6026.291 ms (00:06.026)
Since v32 patch quickly returned the memory contexts as expected with
the same parameter specified, there seems to be some degradation. Could
you check it?
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Hi,
=# select pg_get_process_memory_contexts(pg_backend_pid(), true, 1)
from pg_stat_activity ;
pg_get_process_memory_contexts
--------------------------------
(0 rows)Time: 6026.291 ms (00:06.026)
Since v32 patch quickly returned the memory contexts as expected with
the same parameter specified, there seems to be some degradation. Could
you check it?
Thank you for reporting this failure. This issue was a regression caused by
the absence of a
ConditionVariableSignal() call in the summary = true code path,
which happened due to recent code refactoring.
PFA the fix.
Thank you,
Rahila Syed
Attachments:
v34-0001-Add-pg_get_process_memory_context-function.patchapplication/octet-stream; name=v34-0001-Add-pg_get_process_memory_context-function.patchDownload
From ae939692b3bb507484136a3f32c239b6976ce0c2 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 30 Jun 2025 12:11:00 +0530
Subject: [PATCH] Add pg_get_process_memory_context function
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned.
---
doc/src/sgml/func/func-admin.sgml | 164 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 889 +++++++++++++++++-
src/backend/utils/adt/pg_locale.c | 1 -
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mb/mbutils.c | 1 -
src/backend/utils/mmgr/mcxt.c | 71 +-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 92 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 1279 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 6347fe60b0c..43131c60882 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,137 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type>,
+ <parameter>stats_timestamp</parameter> <type>timestamptz</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +433,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..022586d6a55 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -682,6 +682,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ff96b36d710..3875f76564d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index 0ae9bf906ec..f24f574e748 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..ac97a39447c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index 777c9a8d555..5d14684f6b2 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..fe3d32e40b0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..f194e6b3dcc 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 0cecd464902..3933b6db607 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3534,6 +3534,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..54f91a76a1b 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -161,6 +161,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -406,6 +407,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..e465e156f77 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,13 +15,38 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, int max_levels);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -89,7 +114,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +168,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +183,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +229,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +256,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +264,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +345,806 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to timeout value specified by the user, give up and
+ * return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+
+ if (timeout < 0)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("\"timeout\" must not be negative")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit(1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request", pid));
+ LWLockRelease(client_keys_lock);
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from.
+ * If this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since the
+ * previous server was notified, it might attempt to read the same client entry
+ * and respond incorrectly with its statistics. By storing the server ID in the
+ * client entry, we prevent any previously signalled server process from writing
+ * its statistics in the space meant for the newly requested process.
+ */
+ entry->server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ PG_TRY();
+ {
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ memstats_dsa_cleanup(entry);
+ PG_RETURN_NULL();
+ }
+
+ while (1)
+ {
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ */
+ if (entry->stats_written)
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Recheck the state of the backend before sleeping on the
+ * condition variable to ensure the process is still alive. Only
+ * check the relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ memstats_dsa_cleanup(entry);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+
+ /*
+ * Wait for the timeout as defined by the user. If no statistics
+ * are available within the allowed time then return NULL. The
+ * timer is defined in milliseconds since that's what the
+ * condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (timeout * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(entry);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (memcxt_info[i].name[0] != '\0')
+ {
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+ }
+ else
+ nulls[0] = true;
+
+ if (memcxt_info[i].ident[0] != '\0')
+ {
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ memstats_dsa_cleanup(entry);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ ConditionVariableCancelSleep();
+
+ }
+ PG_CATCH();
+ {
+ memstats_dsa_cleanup(entry);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ ConditionVariableCancelSleep();
+ }
+ PG_END_TRY();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(MemoryStatsDSHashEntry *entry)
+{
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->server_id = 0;
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ /* Retreive the client key for publishing statistics */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ LWLockRelease(client_keys_lock);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /* Entry has been deleted due to client process exit */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /* The client has timed out waiting for us to write statistics */
+ if (entry->server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ summary = entry->summary;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, 100);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsInternal(c, 1, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, 100);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, 100);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+}
+
+/*
+ * Update timestamp and signal all the waiting client backends after copying
+ * all the statistics.
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Empty this processes slot, so other clients can request memory
+ * statistics
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ab768a7a91f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -45,7 +45,6 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
-#include "utils/relcache.h"
#include "utils/syscache.h"
#ifdef WIN32
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 641e535a73c..fb3f2d21fa0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -662,6 +662,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 886ecbad871..308016d7763 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,7 +39,6 @@
#include "mb/pg_wchar.h"
#include "utils/fmgrprotos.h"
#include "utils/memutils.h"
-#include "utils/relcache.h"
#include "varatt.h"
/*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..3a5422c7273 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -40,6 +40,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -176,10 +177,6 @@ MemoryContext PortalContext = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
-static void MemoryContextStatsInternal(MemoryContext context, int level,
- int max_level, int max_children,
- MemoryContextCounters *totals,
- bool print_to_stderr);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -877,11 +874,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 1, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -916,13 +921,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_TO_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
-static void
+void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -930,10 +936,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -953,7 +988,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -972,7 +1007,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i < level; i++)
fprintf(stderr, " ");
@@ -985,7 +1026,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..3d6f42606fd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8597,6 +8597,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bef98471c3..1e59a7f910f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 208d2e3a8ed..72ace053e9d 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -135,3 +135,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 8abc26abce2..799cfe38cab 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -319,4 +339,74 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsInternal(MemoryContext context, int level,
+ int max_level, int max_children,
+ MemoryContextCounters *totals,
+ PrintDestination print_location,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e4a9ec65ab4..e456af77206 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1680,6 +1680,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 2025-08-20 06:42, Rahila Syed wrote:
PFA the fix.
Thanks for updating the patch!
Specifying a very small timeout value (such as 0 or 0.0001) and
repeatedly executing the function seems to cause unexpected behavior. In
some cases, it even leads to a crash.
For example:
(session1)=# select pg_backend_pid();
pg_backend_pid
----------------
50917
(session2)=# select pg_get_process_memory_contexts(50917, true,
0.0001);
pg_get_process_memory_contexts
--------------------------------
(0 rows)
(session2)=# \watch 0.01
pg_get_process_memory_contexts
--------------------------------
(,,???,,0,0,0,0,0,0,0)
...
(21 rows)
(session2)=# \watch 0.01
pg_get_process_memory_contexts
--------------------------------
(0 rows)
...
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
This issue occurs on my M1 Mac, but I couldn’t reproduce it on Ubuntu,
so it might be environment-dependent.
Looking at the logs, Assert() is failing:
2025-10-07 08:48:26.766 JST [local] psql [23626] WARNING: 01000:
server process 23646 is processing previous request
2025-10-07 08:48:26.766 JST [local] psql [23626] LOCATION:
pg_get_process_memory_contexts, mcxtfuncs.c:476
TRAP: failed Assert("victim->magic == FREE_PAGE_SPAN_LEADER_MAGIC"),
File: "freepage.c", Line: 1379, PID: 23626
0 postgres 0x000000010357fdf4
ExceptionalCondition + 216
1 postgres 0x00000001035cbe18
FreePageManagerGetInternal + 684
2 postgres 0x00000001035cbb18
FreePageManagerGet + 40
3 postgres 0x00000001035c84cc
dsa_allocate_extended + 788
4 postgres 0x0000000103453af0
pg_get_process_memory_contexts + 992
5 postgres 0x0000000103007e94
ExecMakeFunctionResultSet + 616
6 postgres 0x00000001030506b8
ExecProjectSRF + 304
7 postgres 0x0000000103050434
ExecProjectSet + 268
8 postgres 0x0000000103003270
ExecProcNodeFirst + 92
9 postgres 0x0000000102ffa398
ExecProcNode + 60
10 postgres 0x0000000102ff5050 ExecutePlan
+ 244
11 postgres 0x0000000102ff4ee0
standard_ExecutorRun + 456
12 postgres 0x0000000102ff4d08 ExecutorRun
+ 84
13 postgres 0x0000000103341c84
PortalRunSelect + 296
14 postgres 0x0000000103341694 PortalRun +
656
15 postgres 0x000000010333c4bc
exec_simple_query + 1388
16 postgres 0x000000010333b5d0
PostgresMain + 3252
17 postgres 0x0000000103332750
BackendInitialize + 0
18 postgres 0x0000000103209e48
postmaster_child_launch + 456
19 postgres 0x00000001032118c8
BackendStartup + 304
20 postgres 0x000000010320f72c ServerLoop
+ 372
21 postgres 0x000000010320e1e4
PostmasterMain + 6448
22 postgres 0x0000000103094b0c main + 924
23 dyld 0x0000000199dc2b98 start +
6076
Could you please check if you can reproduce this crash on your
environment?
And a few minor comments on the patch itself:
+ <parameter>stats_timestamp</parameter>
<type>timestamptz</type> )
As discussed earlier, I believe we decided to remove stats_timestamp,
but it seems it’s still mentioned here.
+ * Update timestamp and signal all the waiting client backends after copying + * all the statistics. + */ +static void +end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
Should “Update timestamp” in this comment also be removed for
consistency?
The column order differs slightly from pg_backend_memory_contexts.
If there’s no strong reason for the difference, perhaps aligning the
order might improve consistency:
=# select * from pg_get_process_memory_contexts(pg_backend_pid(),
true, 1) ;
name | TopMemoryContext
ident | [NULL]
type | AllocSet
path | {1}
level | 1
total_bytes | 222400
=# select * from pg_backend_memory_contexts;
name | TopMemoryContext
ident | [NULL]
type | AllocSet
level | 1
path | {1}
total_bytes | 99232
...
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Hi Torikoshia,
Thank you for testing and reviewing the patch.
This issue occurs on my M1 Mac, but I couldn’t reproduce it on Ubuntu,
so it might be environment-dependent.
Looking at the logs, Assert() is failing:
2025-10-07 08:48:26.766 JST [local] psql [23626] WARNING: 01000:
server process 23646 is processing previous request
2025-10-07 08:48:26.766 JST [local] psql [23626] LOCATION:
pg_get_process_memory_contexts, mcxtfuncs.c:476
TRAP: failed Assert("victim->magic == FREE_PAGE_SPAN_LEADER_MAGIC"),
File: "freepage.c", Line: 1379, PID: 23626
0 postgres 0x000000010357fdf4
ExceptionalCondition + 216
1 postgres 0x00000001035cbe18
FreePageManagerGetInternal + 684
2 postgres 0x00000001035cbb18
FreePageManagerGet + 40
3 postgres 0x00000001035c84cc
dsa_allocate_extended + 788
4 postgres 0x0000000103453af0
pg_get_process_memory_contexts + 992
5 postgres 0x0000000103007e94
ExecMakeFunctionResultSet + 616
6 postgres 0x00000001030506b8
ExecProjectSRF + 304
7 postgres 0x0000000103050434
ExecProjectSet + 268
8 postgres 0x0000000103003270
ExecProcNodeFirst + 92
9 postgres 0x0000000102ffa398
ExecProcNode + 60
10 postgres 0x0000000102ff5050 ExecutePlan
+ 244
11 postgres 0x0000000102ff4ee0
standard_ExecutorRun + 456
12 postgres 0x0000000102ff4d08 ExecutorRun
+ 84
13 postgres 0x0000000103341c84
PortalRunSelect + 296
14 postgres 0x0000000103341694 PortalRun +
656
15 postgres 0x000000010333c4bc
exec_simple_query + 1388
16 postgres 0x000000010333b5d0
PostgresMain + 3252
17 postgres 0x0000000103332750
BackendInitialize + 0
18 postgres 0x0000000103209e48
postmaster_child_launch + 456
19 postgres 0x00000001032118c8
BackendStartup + 304
20 postgres 0x000000010320f72c ServerLoop
+ 372
21 postgres 0x000000010320e1e4
PostmasterMain + 6448
22 postgres 0x0000000103094b0c main + 924
23 dyld 0x0000000199dc2b98 start +
6076Could you please check if you can reproduce this crash on your
environment?
I haven't been able to reproduce this issue on Ubuntu. A colleague also
tested it on their Mac
and didn't encounter the problem. I do have a fix in this area that I
believe should address an edge
case where data might be written to freed DSA memory.
Kindly test using the v35 patch and let me know if you still see the issue.
As discussed earlier, I believe we decided to remove stats_timestamp,
but it seems it’s still mentioned here.
Fixed.
+ * Update timestamp and signal all the waiting client backends after copying + * all the statistics. + */ +static void +end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)Should “Update timestamp” in this comment also be removed for
consistency?
Fixed.
The column order differs slightly from pg_backend_memory_contexts.
If there’s no strong reason for the difference, perhaps aligning the
order might improve consistency:
Makes sense. I will fix this in the next iteration of the patch.
I am also attaching a test which implements crash testing using injection
points.
I plan to improve the tests further to increase the test coverage of the
feature.
Thank you,
Rahila Syed
Attachments:
0001-v35-0001-Add-pg_get_process_memory_context-function.patchapplication/octet-stream; name=0001-v35-0001-Add-pg_get_process_memory_context-function.patchDownload
From 57a3627b90b4e05d5d15ddef1edad280296120ab Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Tue, 7 Oct 2025 19:38:53 +0530
Subject: [PATCH 1/2] Add pg_get_process_memory_context function
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned.
---
doc/src/sgml/func/func-admin.sgml | 163 ++++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 907 +++++++++++++++++-
src/backend/utils/adt/pg_locale.c | 1 -
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mb/mbutils.c | 1 -
src/backend/utils/mmgr/mcxt.c | 71 +-
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 92 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 1296 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..a9d60202a57 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,136 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type>, <parameter>timeout</parameter> <type>float</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ Busy processes can delay reporting memory context statistics,
+ <parameter>timeout</parameter> specifies the number of seconds
+ to wait for updated statistics. <parameter>timeout</parameter> can be
+ specified in fractions of a second.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +432,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false, 0.5) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+path | {1}
+level | 1
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 884b6a23817..41d80661320 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -682,6 +682,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean, float) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index fb5d3b27224..16092f619b2 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -790,6 +790,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..ac97a39447c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index e1f142f20c7..c711c887ef6 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..fe3d32e40b0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 96f29aafc39..550a3a77bb8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d356830f756..0fdd2202ddd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3538,6 +3538,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..cdbc7309206 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -160,6 +160,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -401,6 +402,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..17b50e853d5 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,13 +15,38 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts, int max_levels);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -89,7 +114,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +168,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +183,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +229,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +256,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +264,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +345,824 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to timeout value specified by the user, give up and
+ * return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ double timeout = PG_GETARG_FLOAT8(2);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+
+ if (timeout < 0)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("\"timeout\" must not be negative")));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit(1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ /*
+ * XXX. If the process exits without cleaning up its slot, i.e in case of an
+ * abrupt crash the client_keys slot won't be reset thus resulting in false
+ * negative and WARNING would be thrown in case another process with same
+ * slot index is queried for statistics.
+ */
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request", pid));
+ LWLockRelease(client_keys_lock);
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from.
+ * If this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since the
+ * previous server was notified, it might attempt to read the same client entry
+ * and respond incorrectly with its statistics. By storing the server ID in the
+ * client entry, we prevent any previously signalled server process from writing
+ * its statistics in the space meant for the newly requested process.
+ */
+ entry->server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ PG_TRY();
+ {
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ memstats_dsa_cleanup(key);
+ PG_RETURN_NULL();
+ }
+
+ while (1)
+ {
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ */
+ if (entry->stats_written)
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Recheck the state of the backend before sleeping on the
+ * condition variable to ensure the process is still alive. Only
+ * check the relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The process ending during memory context processing is not an
+ * error.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ memstats_dsa_cleanup(key);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+
+ /*
+ * Wait for the timeout as defined by the user. If no statistics
+ * are available within the allowed time then return NULL. The
+ * timer is defined in milliseconds since that's what the
+ * condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (timeout * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ if (memcxt_info[i].name[0] != '\0')
+ {
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+ }
+ else
+ nulls[0] = true;
+
+ if (memcxt_info[i].ident[0] != '\0')
+ {
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ }
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[3] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[3] = true;
+
+ values[4] = Int32GetDatum(memcxt_info[i].levels);
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ }
+ PG_CATCH();
+ {
+ memstats_dsa_cleanup(key);
+ ConditionVariableCancelSleep();
+ }
+ PG_END_TRY();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ /* Retreive the client key for publishing statistics */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ LWLockRelease(client_keys_lock);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Entry has been deleted due to client process exit
+ * Make sure that the client always deletes the entry
+ * after taking required lock or this function may end up writing
+ * to unallocated memory.
+ */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /*
+ * The client has timed out waiting for us to write statistics and is
+ * requesting statistics from some other process
+ */
+ if (entry->server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ summary = entry->summary;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1, 100);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsInternal(c, 1, 100, 100, &grand_totals,
+ PRINT_STATS_NONE, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts, 100);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, 100);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+}
+
+/*
+ * Clean up before exit from ProcessGetMemoryContextInterrupt
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Empty this processes slot, so other clients can request memory
+ * statistics
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts, int max_levels)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), max_levels);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ab768a7a91f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -45,7 +45,6 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
-#include "utils/relcache.h"
#include "utils/syscache.h"
#ifdef WIN32
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 641e535a73c..fb3f2d21fa0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -662,6 +662,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 886ecbad871..308016d7763 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,7 +39,6 @@
#include "mb/pg_wchar.h"
#include "utils/fmgrprotos.h"
#include "utils/memutils.h"
-#include "utils/relcache.h"
#include "varatt.h"
/*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..3a5422c7273 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -40,6 +40,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "utils/hsearch.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
@@ -176,10 +177,6 @@ MemoryContext PortalContext = NULL;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
-static void MemoryContextStatsInternal(MemoryContext context, int level,
- int max_level, int max_children,
- MemoryContextCounters *totals,
- bool print_to_stderr);
static void MemoryContextStatsPrint(MemoryContext context, void *passthru,
const char *stats_string,
bool print_to_stderr);
@@ -877,11 +874,19 @@ MemoryContextStatsDetail(MemoryContext context,
bool print_to_stderr)
{
MemoryContextCounters grand_totals;
+ int num_contexts;
+ PrintDestination print_location;
memset(&grand_totals, 0, sizeof(grand_totals));
+ if (print_to_stderr)
+ print_location = PRINT_STATS_TO_STDERR;
+ else
+ print_location = PRINT_STATS_TO_LOGS;
+
+ /* num_contexts report number of contexts aggregated in the output */
MemoryContextStatsInternal(context, 1, max_level, max_children,
- &grand_totals, print_to_stderr);
+ &grand_totals, print_location, &num_contexts);
if (print_to_stderr)
fprintf(stderr,
@@ -916,13 +921,14 @@ MemoryContextStatsDetail(MemoryContext context,
* One recursion level for MemoryContextStats
*
* Print stats for this context if possible, but in any case accumulate counts
- * into *totals (if not NULL).
+ * into *totals (if not NULL). The callers should make sure that print_location
+ * is set to PRINT_STATS_TO_STDERR or PRINT_STATS_TO_LOGS or PRINT_STATS_NONE.
*/
-static void
+void
MemoryContextStatsInternal(MemoryContext context, int level,
int max_level, int max_children,
MemoryContextCounters *totals,
- bool print_to_stderr)
+ PrintDestination print_location, int *num_contexts)
{
MemoryContext child;
int ichild;
@@ -930,10 +936,39 @@ MemoryContextStatsInternal(MemoryContext context, int level,
Assert(MemoryContextIsValid(context));
/* Examine the context itself */
- context->methods->stats(context,
- MemoryContextStatsPrint,
- &level,
- totals, print_to_stderr);
+ switch (print_location)
+ {
+ case PRINT_STATS_TO_STDERR:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, true);
+ break;
+
+ case PRINT_STATS_TO_LOGS:
+ context->methods->stats(context,
+ MemoryContextStatsPrint,
+ &level,
+ totals, false);
+ break;
+
+ case PRINT_STATS_NONE:
+
+ /*
+ * Do not print the statistics if print_location is
+ * PRINT_STATS_NONE, only compute totals. This is used in
+ * reporting of memory context statistics via a sql function. Last
+ * parameter is not relevant.
+ */
+ context->methods->stats(context,
+ NULL,
+ NULL,
+ totals, false);
+ break;
+ }
+
+ /* Increment the context count for each of the recursive call */
+ *num_contexts = *num_contexts + 1;
/*
* Examine children.
@@ -953,7 +988,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
MemoryContextStatsInternal(child, level + 1,
max_level, max_children,
totals,
- print_to_stderr);
+ print_location, num_contexts);
}
}
@@ -972,7 +1007,13 @@ MemoryContextStatsInternal(MemoryContext context, int level,
child = MemoryContextTraverseNext(child, context);
}
- if (print_to_stderr)
+ /*
+ * Add the count of children contexts which are traversed in the
+ * non-recursive manner.
+ */
+ *num_contexts = *num_contexts + ichild;
+
+ if (print_location == PRINT_STATS_TO_STDERR)
{
for (int i = 0; i < level; i++)
fprintf(stderr, " ");
@@ -985,7 +1026,7 @@ MemoryContextStatsInternal(MemoryContext context, int level,
local_totals.freechunks,
local_totals.totalspace - local_totals.freespace);
}
- else
+ else if (print_location == PRINT_STATS_TO_LOGS)
ereport(LOG_SERVER_ONLY,
(errhidestmt(true),
errhidecontext(true),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c20180637f..8859ed449d0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8613,6 +8613,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool float8',
+ proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bef98471c3..1e59a7f910f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 06a1ffd4b08..4c71d756a2d 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -135,3 +135,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..38451035fd4 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,23 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 64
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context. Actual size may be lower as this assumes the worst
+ * case of deepest path and longest identifiers (name and ident, thus the
+ * multiplication by 2). The path depth is limited to 100 like for memory
+ * context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -319,4 +339,74 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+/*
+ * This is passed to MemoryContextStatsInternal to determine whether
+ * to print context statistics or not and where to print them logs or
+ * stderr.
+ */
+typedef enum PrintDestination
+{
+ PRINT_STATS_TO_STDERR = 0,
+ PRINT_STATS_TO_LOGS,
+ PRINT_STATS_NONE
+} PrintDestination;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsInternal(MemoryContext context, int level,
+ int max_level, int max_children,
+ MemoryContextCounters *totals,
+ PrintDestination print_location,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..ae17d028ed3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -232,3 +232,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..d0917b6868e 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..63fd382d4a0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1680,6 +1680,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
0002-Test-fixes.patchapplication/octet-stream; name=0002-Test-fixes.patchDownload
From b537ccf2b5f56eedcb22f066e00a6d61aececa9e Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 9 Oct 2025 12:07:20 +0530
Subject: [PATCH] Test fixes
---
src/backend/utils/adt/mcxtfuncs.c | 2 +
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 32 +++++++++
.../t/001_memcontext_inj.pl | 45 ++++++++++++
.../test_memcontext_reporting--1.0.sql | 7 ++
.../test_memcontext_reporting.c | 68 +++++++++++++++++++
.../test_memcontext_reporting.control | 4 ++
7 files changed, 159 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index aa4e0a2e670..8c3d3f321aa 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -27,6 +27,7 @@
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
#include "utils/memutils.h"
#include "utils/wait_event_types.h"
@@ -541,6 +542,7 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
+ INJECTION_POINT("memcontext-client-crash", NULL);
while (1)
{
entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 902a7954101..a31a2578c18 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..01a7baa0263
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,32 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+EXTENSION = test_memcontext_reporting
+DATA = test_memcontext_reporting--1.0.sql
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..842f32376fd
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,45 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing enabling data checksums in an online cluster,
+# comprising of a primary and a replicated standby, with concurrent activity
+# via pgbench runs
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 1, no_data_checksums => 1);
+# max_connections need to be bumped in order to accommodate for pgbench clients
+# and log_statement is dialled down since it otherwise will generate enormous
+# amounts of logging. Page verification failures are still logged.
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION test_memcontext_reporting;');
+# Attaching to an injection point that crashes memory context client
+$node->safe_psql('postgres', 'SELECT memcontext_crash_client();');
+
+my $pid = $node->safe_psql('postgres', "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+
+#Client should have crashed
+$node->safe_psql('postgres', "select pg_get_process_memory_contexts($pid, true, 5);");
+print "PID";
+print $pid;
+#Query the same process for memory context using some other client and it should succeed.
+my $topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true, 5) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
new file mode 100644
index 00000000000..7f628bf24e2
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
@@ -0,0 +1,7 @@
+CREATE FUNCTION memcontext_crash_server()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION memcontext_crash_client()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..f77875b437f
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,68 @@
+/*
+ * -------------------------------------------------------------------------
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "utils/injection_point.h"
+#include "funcapi.h"
+#include "utils/injection_point.h"
+
+PG_MODULE_MAGIC;
+
+void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
+
+/*
+ * memcontext_crash_client
+ *
+ * Ensure that the client process aborts in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_client);
+Datum
+memcontext_crash_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-client-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_crash_server
+ *
+ * Ensure that the server process crashes in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_server);
+Datum
+memcontext_crash_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-server-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
On 2025-10-09 17:43, Rahila Syed wrote:
Hi Torikoshia,
Thank you for testing and reviewing the patch.
This issue occurs on my M1 Mac, but I couldn’t reproduce it on
Ubuntu,
so it might be environment-dependent.Looking at the logs, Assert() is failing:
2025-10-07 08:48:26.766 JST [local] psql [23626] WARNING: 01000:
server process 23646 is processing previous request
2025-10-07 08:48:26.766 JST [local] psql [23626] LOCATION:
pg_get_process_memory_contexts, mcxtfuncs.c:476
TRAP: failed Assert("victim->magic ==
FREE_PAGE_SPAN_LEADER_MAGIC"),
File: "freepage.c", Line: 1379, PID: 23626
0 postgres 0x000000010357fdf4
ExceptionalCondition + 216
1 postgres 0x00000001035cbe18
FreePageManagerGetInternal + 684
2 postgres 0x00000001035cbb18
FreePageManagerGet + 40
3 postgres 0x00000001035c84cc
dsa_allocate_extended + 788
4 postgres 0x0000000103453af0
pg_get_process_memory_contexts + 992
5 postgres 0x0000000103007e94
ExecMakeFunctionResultSet + 616
6 postgres 0x00000001030506b8
ExecProjectSRF + 304
7 postgres 0x0000000103050434
ExecProjectSet + 268
8 postgres 0x0000000103003270
ExecProcNodeFirst + 92
9 postgres 0x0000000102ffa398
ExecProcNode + 60
10 postgres 0x0000000102ff5050
ExecutePlan
+ 244
11 postgres 0x0000000102ff4ee0
standard_ExecutorRun + 456
12 postgres 0x0000000102ff4d08
ExecutorRun
+ 84
13 postgres 0x0000000103341c84
PortalRunSelect + 296
14 postgres 0x0000000103341694
PortalRun +
656
15 postgres 0x000000010333c4bc
exec_simple_query + 1388
16 postgres 0x000000010333b5d0
PostgresMain + 3252
17 postgres 0x0000000103332750
BackendInitialize + 0
18 postgres 0x0000000103209e48
postmaster_child_launch + 456
19 postgres 0x00000001032118c8
BackendStartup + 304
20 postgres 0x000000010320f72c
ServerLoop
+ 372
21 postgres 0x000000010320e1e4
PostmasterMain + 6448
22 postgres 0x0000000103094b0c main +
924
23 dyld 0x0000000199dc2b98 start
+
6076Could you please check if you can reproduce this crash on your
environment?I haven't been able to reproduce this issue on Ubuntu. A colleague
also tested it on their Mac
and didn't encounter the problem. I do have a fix in this area that I
believe should address an edge
case where data might be written to freed DSA memory.Kindly test using the v35 patch and let me know if you still see the
issue.
Thanks for the update.
v35 works fine on my environment.
I ran the same test and haven’t encountered the crash anymore.
The addition of the following code appears to have resolved the issue:
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
Since you seem to make a next version patch, I understand v35 is an
interim patch,
so this isn’t a major concern, but I encountered trailing whitespace
warnings when applying the patches:
$ git apply
0001-v35-0001-Add-pg_get_process_memory_context-function.patch
0001-v35-0001-Add-pg_get_process_memory_context-function.patch:705:
trailing whitespace.
0001-v35-0001-Add-pg_get_process_memory_context-function.patch:1066:
trailing whitespace.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Hi,
PFA an updated v39 patch which is ready for review in the upcoming
commitfest.
v35 works fine on my environment.
I ran the same test and haven’t encountered the crash anymore.
Thank you for testing and confirming the fix.
The addition of the following code appears to have resolved the issue:
+memstats_dsa_cleanup(char *key) +{ + MemoryStatsDSHashEntry *entry; + + entry = dshash_find(MemoryStatsDsHash, key, true);
Yes, without this code, the dsa memory was being freed in the timeout path
without acquiring a lock.
Since you seem to make a next version patch, I understand v35 is an
interim patch,
so this isn’t a major concern, but I encountered trailing whitespace
warnings when applying the patches.
$ git apply
0001-v35-0001-Add-pg_get_process_memory_context-function.patch
0001-v35-0001-Add-pg_get_process_memory_context-function.patch:705:
trailing whitespace.
0001-v35-0001-Add-pg_get_process_memory_context-function.patch:1066:
trailing whitespace.
Thanks, should be fixed now.
The updated patch contains the following changes. These changes are
addressing some review comments
discussed off list and a couple of bugs found while doing injection points
tests.
1.
All the changes made to MemoryContextStatsInternal and
MemoryContextStatsDetail are removed.
Instead of modifying these functions, I have written a separate function
MemoryContextStatsCounter
that takes care of counting statistics. This approach ensures that the
existing functions remain unchanged.
2. Changes to ensure that the wait loop does not exceed the prescribed wait
time.
Additional exit condition has been added to the infinite loop that waits
for request completion.
This allows the pg_get_memoy_context_statistics function to return if the
elapsed time goes beyond
a set limit i.e the following timeout.
3. The user facing timeout is removed as that would complicate the user
interface. CFIs
are called frequently and the requests are likely to be addressed promptly.
A predefined macro MEMORY_CONTEXT_STATS_TIMEOUT 5 (secs) is used for
timeout
instead. This would also remove the possibility of a user setting very low
timeouts, which
could cause requests to be incomplete and result in NULL outputs.
4. Miscellaneous cleanups to improve comments and remove left over comments
from older
versions. Also, removed an unnecessary argument from the
PublishMemoryContext function.
5. Addressed Torikoshias suggestion to change the order of columns to match
pg_backend_memory_contexts.
6. Attached is a test module that tests error handling by introducing
errors using
injection points. I have resolved a few bugs, so the memory monitoring
function
now runs correctly after the previous request ended with an error.
Thank you,
Rahila Syed
Attachments:
v39-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v39-0001-Add-function-to-report-memory-context-statistics.patchDownload
From 2bd097622985e2c800ee0ca00c9bb08887ea66f7 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 23 Oct 2025 17:31:52 +0530
Subject: [PATCH 1/2] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 157 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 910 +++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 29 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 81 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
25 files changed, 1255 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..a5c66837241 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,130 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +426,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 823776c1498..3bb2a9d3c9b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -692,6 +692,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 5084af7dfb6..8bf6f6eb743 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78e39e5f866..ac97a39447c 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -867,6 +867,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index e1f142f20c7..c711c887ef6 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..fe3d32e40b0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -51,6 +51,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -150,6 +151,7 @@ CalculateShmemSize(int *num_semaphores)
size = add_size(size, InjectionPointShmemSize());
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -343,6 +345,7 @@ CreateOrAttachShmemStructs(void)
WaitEventCustomShmemInit();
InjectionPointShmemInit();
AioShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 96f29aafc39..550a3a77bb8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -50,6 +50,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..e726f40dfbb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7553f6eacef..cdbc7309206 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -160,6 +160,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -401,6 +402,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..a62f3d6dc93 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,20 +15,51 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
* MemoryContextId
* Used for storage of transient identifiers for
@@ -89,7 +120,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +174,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +189,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +235,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +262,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +270,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +351,821 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give up
+ * and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is a warning because we don't want to break loops.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ /*
+ * XXX. If the process exits without cleaning up its slot, i.e in case of
+ * an abrupt crash the client_keys slot won't be reset thus resulting in
+ * false negative and WARNING would be thrown in case another process with
+ * same slot index is queried for statistics.
+ */
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request", pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Retreive the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-crash", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Entry has been deleted due to client process exit. Make sure that the
+ * client always deletes the entry after taking required lock or this
+ * function may end up writing to unallocated memory.
+ */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /*
+ * The client has timed out waiting for us to write statistics and is
+ * requesting statistics from some other process
+ */
+ if (entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+ summary = entry->summary;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+}
+
+/*
+ * Clean up before exit from ProcessGetMemoryContextInterrupt
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+/* Used for testing purposes */
+dsa_area *
+pg_get_memstats_dsa_area(void)
+{
+ if (MemoryStatsDsaArea != NULL)
+ return MemoryStatsDsaArea;
+ else
+ return NULL;
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 641e535a73c..fb3f2d21fa0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -662,6 +662,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..56c2048c67a 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1008,6 +1008,35 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of children contexts which are traversed
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b51d2b17379..52e8e525d86 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bef98471c3..1e59a7f910f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 06a1ffd4b08..4c71d756a2d 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -135,3 +135,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..2d7220cde45 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,26 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context statistics. The identifier and name are statically
+ * allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -149,6 +172,7 @@ extern MemoryContext BumpContextCreate(MemoryContext parent,
Size minContextSize,
Size initBlockSize,
Size maxBlockSize);
+extern dsa_area *pg_get_memstats_dsa_area(void);
/*
* Recommended default alloc parameters, suitable for "ordinary" contexts
@@ -319,4 +343,59 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 3b37fafa65b..21c65ad2d10 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 377a7946585..e9a7adfd380 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1682,6 +1682,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v2-0002-Test-module-to-test-memory-context-reporting-with-in.patchapplication/octet-stream; name=v2-0002-Test-module-to-test-memory-context-reporting-with-in.patchDownload
From e21832c610574f48496c961a7318070b720568de Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 23 Oct 2025 18:01:36 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 32 +++++
.../t/001_memcontext_inj.pl | 58 +++++++++
.../test_memcontext_reporting--1.0.sql | 11 ++
.../test_memcontext_reporting.c | 123 ++++++++++++++++++
.../test_memcontext_reporting.control | 4 +
6 files changed, 229 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 902a7954101..a31a2578c18 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..01a7baa0263
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,32 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+EXTENSION = test_memcontext_reporting
+DATA = test_memcontext_reporting--1.0.sql
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..69d8489eb37
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,58 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 1);
+# max_connections need to be bumped in order to accommodate for pgbench clients
+# and log_statement is dialled down since it otherwise will generate enormous
+# amounts of logging. Page verification failures are still logged.
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION test_memcontext_reporting;');
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+# Attaching to a client process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-client-crash', 'error');");
+
+my $pid = $node->safe_psql('postgres', "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+print "PID";
+print $pid;
+
+#Client should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+like ( $psql_err, qr/error triggered for injection point memcontext-client-crash/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-client-crash');");
+my $topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-server-crash', 'error');");
+
+#Server should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-server-crash');");
+$topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
new file mode 100644
index 00000000000..181daf429d0
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
@@ -0,0 +1,11 @@
+CREATE FUNCTION memcontext_crash_server()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION memcontext_crash_client()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION dsa_dump_sql()
+RETURNS bigint
+AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..955155524c2
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,123 @@
+/*
+ * -------------------------------------------------------------------------
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "utils/injection_point.h"
+#include "funcapi.h"
+#include "utils/injection_point.h"
+#include "storage/dsm_registry.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
+
+/*
+ * memcontext_crash_client
+ *
+ * Ensure that the client process aborts in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_client);
+Datum
+memcontext_crash_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-client-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(memcontext_detach_client);
+Datum
+memcontext_detach_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-client-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_crash_server
+ *
+ * Ensure that the server process crashes in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_server);
+Datum
+memcontext_crash_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-server-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_detach_server
+ *
+ * Detach the injection point which crashes the server
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_detach_server);
+Datum
+memcontext_detach_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-server-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * dsa_dump_sql
+ */
+PG_FUNCTION_INFO_V1(dsa_dump_sql);
+Datum
+dsa_dump_sql(PG_FUNCTION_ARGS)
+{
+ bool found;
+ size_t tot_size;
+ dsa_area *memstats_dsa_area;
+
+ memstats_dsa_area = pg_get_memstats_dsa_area();
+
+ if (memstats_dsa_area == NULL)
+ memstats_dsa_area = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ tot_size = dsa_get_total_size(memstats_dsa_area);
+ dsa_detach(memstats_dsa_area);
+ PG_RETURN_INT64(tot_size);
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
Hi,
I have attached a version 40 patch that has been rebased onto the
latest master branch, as CFbot indicated a rebase was needed.
The test module patch is unchanged.
Thank you,
Rahila Syed
On Tue, Oct 28, 2025 at 11:06 AM Rahila Syed <rahilasyed90@gmail.com> wrote:
Show quoted text
Hi,
PFA an updated v39 patch which is ready for review in the upcoming
commitfest.v35 works fine on my environment.
I ran the same test and haven’t encountered the crash anymore.Thank you for testing and confirming the fix.
The addition of the following code appears to have resolved the issue:
+memstats_dsa_cleanup(char *key) +{ + MemoryStatsDSHashEntry *entry; + + entry = dshash_find(MemoryStatsDsHash, key, true);Yes, without this code, the dsa memory was being freed in the timeout path
without acquiring a lock.Since you seem to make a next version patch, I understand v35 is an
interim patch,
so this isn’t a major concern, but I encountered trailing whitespace
warnings when applying the patches.$ git apply
0001-v35-0001-Add-pg_get_process_memory_context-function.patch
0001-v35-0001-Add-pg_get_process_memory_context-function.patch:705:
trailing whitespace.
0001-v35-0001-Add-pg_get_process_memory_context-function.patch:1066:
trailing whitespace.Thanks, should be fixed now.
The updated patch contains the following changes. These changes are
addressing some review comments
discussed off list and a couple of bugs found while doing injection points
tests.1.
All the changes made to MemoryContextStatsInternal and
MemoryContextStatsDetail are removed.
Instead of modifying these functions, I have written a separate function
MemoryContextStatsCounter
that takes care of counting statistics. This approach ensures that the
existing functions remain unchanged.2. Changes to ensure that the wait loop does not exceed the prescribed
wait time.
Additional exit condition has been added to the infinite loop that waits
for request completion.
This allows the pg_get_memoy_context_statistics function to return if the
elapsed time goes beyond
a set limit i.e the following timeout.3. The user facing timeout is removed as that would complicate the user
interface. CFIs
are called frequently and the requests are likely to be addressed promptly.
A predefined macro MEMORY_CONTEXT_STATS_TIMEOUT 5 (secs) is used for
timeout
instead. This would also remove the possibility of a user setting very
low timeouts, which
could cause requests to be incomplete and result in NULL outputs.4. Miscellaneous cleanups to improve comments and remove left over
comments from older
versions. Also, removed an unnecessary argument from the
PublishMemoryContext function.5. Addressed Torikoshias suggestion to change the order of columns to match
pg_backend_memory_contexts.6. Attached is a test module that tests error handling by introducing
errors using
injection points. I have resolved a few bugs, so the memory monitoring
function
now runs correctly after the previous request ended with an error.Thank you,
Rahila Syed
Attachments:
v40-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v40-0001-Add-function-to-report-memory-context-statistics.patchDownload
From c8cddf59c3d9b1f6967acd86ec79945bac06c2f0 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Sat, 8 Nov 2025 04:06:21 +0530
Subject: [PATCH] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 157 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 910 +++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 29 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 81 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 3 +
25 files changed, 1255 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..a5c66837241 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,130 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +426,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 059e8778ca7..c63fd6783bd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -692,6 +692,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ed19c74bb19..34bdb88fa7f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index ce6b5299324..fdd385e492d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index c4a888a081c..00f03b36ed8 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b23d0c19360..a5ed58a18c5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -140,6 +141,7 @@ CalculateShmemSize(void)
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -328,6 +330,7 @@ CreateOrAttachShmemStructs(void)
InjectionPointShmemInit();
AioShmemInit();
WaitLSNShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 1504fafe6d8..c5e69151756 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2bd89102686..da8f2b97986 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3549,6 +3549,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..644d8d988e1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -162,6 +162,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -404,6 +405,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..a62f3d6dc93 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,20 +15,51 @@
#include "postgres.h"
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
* MemoryContextId
* Used for storage of transient identifiers for
@@ -89,7 +120,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +174,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +189,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +235,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +262,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +270,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +351,821 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give up
+ * and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is a warning because we don't want to break loops.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ /*
+ * XXX. If the process exits without cleaning up its slot, i.e in case of
+ * an abrupt crash the client_keys slot won't be reset thus resulting in
+ * false negative and WARNING would be thrown in case another process with
+ * same slot index is queried for statistics.
+ */
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request", pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ bool summary = false;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Retreive the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
+
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-crash", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Entry has been deleted due to client process exit. Make sure that the
+ * client always deletes the entry after taking required lock or this
+ * function may end up writing to unallocated memory.
+ */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /*
+ * The client has timed out waiting for us to write statistics and is
+ * requesting statistics from some other process
+ */
+ if (entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+ summary = entry->summary;
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name, "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+}
+
+/*
+ * Clean up before exit from ProcessGetMemoryContextInterrupt
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+/* Used for testing purposes */
+dsa_area *
+pg_get_memstats_dsa_area(void)
+{
+ if (MemoryStatsDsaArea != NULL)
+ return MemoryStatsDsaArea;
+ else
+ return NULL;
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..202403ebc63 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -658,6 +658,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..56c2048c67a 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1008,6 +1008,35 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of children contexts which are traversed
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5cf9e12fcb9..bb72b85457d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..b76f24baed6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 5b0ce383408..613e769c84e 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -136,3 +136,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..2d7220cde45 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,10 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "storage/condition_variable.h"
+#include "storage/lmgr.h"
+#include "utils/dsa.h"
+#include "lib/dshash.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +51,26 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum size per context statistics. The identifier and name are statically
+ * allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
/*
* Standard top-level memory contexts.
@@ -149,6 +172,7 @@ extern MemoryContext BumpContextCreate(MemoryContext parent,
Size minContextSize,
Size initBlockSize,
Size maxBlockSize);
+extern dsa_area *pg_get_memstats_dsa_area(void);
/*
* Recommended default alloc parameters, suitable for "ordinary" contexts
@@ -319,4 +343,59 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+/* Dynamic shared memory state for statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * Used for storage of transient identifiers for pg_get_backend_memory_contexts
+ */
+typedef struct MemoryStatsContextId
+{
+ MemoryContext context;
+ int context_id;
+} MemoryStatsContextId;
+
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 3b37fafa65b..21c65ad2d10 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 432509277c9..2990c807f45 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1684,6 +1684,9 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v2-0002-Test-module-to-test-memory-context-reporting-with-in.patchapplication/octet-stream; name=v2-0002-Test-module-to-test-memory-context-reporting-with-in.patchDownload
From e21832c610574f48496c961a7318070b720568de Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 23 Oct 2025 18:01:36 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 32 +++++
.../t/001_memcontext_inj.pl | 58 +++++++++
.../test_memcontext_reporting--1.0.sql | 11 ++
.../test_memcontext_reporting.c | 123 ++++++++++++++++++
.../test_memcontext_reporting.control | 4 +
6 files changed, 229 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 902a7954101..a31a2578c18 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..01a7baa0263
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,32 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+EXTENSION = test_memcontext_reporting
+DATA = test_memcontext_reporting--1.0.sql
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..69d8489eb37
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,58 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 1);
+# max_connections need to be bumped in order to accommodate for pgbench clients
+# and log_statement is dialled down since it otherwise will generate enormous
+# amounts of logging. Page verification failures are still logged.
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION test_memcontext_reporting;');
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+# Attaching to a client process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-client-crash', 'error');");
+
+my $pid = $node->safe_psql('postgres', "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+print "PID";
+print $pid;
+
+#Client should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+like ( $psql_err, qr/error triggered for injection point memcontext-client-crash/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-client-crash');");
+my $topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-server-crash', 'error');");
+
+#Server should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-server-crash');");
+$topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
new file mode 100644
index 00000000000..181daf429d0
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
@@ -0,0 +1,11 @@
+CREATE FUNCTION memcontext_crash_server()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION memcontext_crash_client()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION dsa_dump_sql()
+RETURNS bigint
+AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..955155524c2
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,123 @@
+/*
+ * -------------------------------------------------------------------------
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "utils/injection_point.h"
+#include "funcapi.h"
+#include "utils/injection_point.h"
+#include "storage/dsm_registry.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
+
+/*
+ * memcontext_crash_client
+ *
+ * Ensure that the client process aborts in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_client);
+Datum
+memcontext_crash_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-client-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(memcontext_detach_client);
+Datum
+memcontext_detach_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-client-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_crash_server
+ *
+ * Ensure that the server process crashes in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_server);
+Datum
+memcontext_crash_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-server-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_detach_server
+ *
+ * Detach the injection point which crashes the server
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_detach_server);
+Datum
+memcontext_detach_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-server-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * dsa_dump_sql
+ */
+PG_FUNCTION_INFO_V1(dsa_dump_sql);
+Datum
+dsa_dump_sql(PG_FUNCTION_ARGS)
+{
+ bool found;
+ size_t tot_size;
+ dsa_area *memstats_dsa_area;
+
+ memstats_dsa_area = pg_get_memstats_dsa_area();
+
+ if (memstats_dsa_area == NULL)
+ memstats_dsa_area = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ tot_size = dsa_get_total_size(memstats_dsa_area);
+ dsa_detach(memstats_dsa_area);
+ PG_RETURN_INT64(tot_size);
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
On 7 Nov 2025, at 23:55, Rahila Syed <rahilasyed90@gmail.com> wrote:
I have attached a version 40 patch that has been rebased onto the
latest master branch, as CFbot indicated a rebase was needed.
Thanks for the rebase, below are a few mostly superficial comments on the
patch:
+#include "access/twophase.h"
+#include "catalog/pg_authid_d.h"
...
+#include "utils/acl.h"
Are these actually required to be included?
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
Why is this needed? MemoryStatsContextId is identical to MemoryContextId and
is too only used in mcxtfuncs.c so there is no need to expose it in memutils.h.
Can't you just use MemoryContextId everywhere or am I missing something?
+#define CLIENT_KEY_SIZE 64
+
+static LWLock *client_keys_lock = NULL;
+static int *client_keys = NULL;
+static dshash_table *MemoryStatsDsHash = NULL;
+static dsa_area *MemoryStatsDsaArea = NULL;
These new additions have in some cases too generic names (client_keys etc) and
they all lack comments explaining why they're needed. Maybe a leading block
comment explaining they are used for process memory context reporting, and then
inline comments on each with their use?
+#define CLIENT_KEY_SIZE 64
...
+ char key[CLIENT_KEY_SIZE];
...
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
Given that MyProcNumber is an index into the proc array, it seems excessive to
use 64 bytes to store it, can't we get away with a small stack allocation?
+ * Retreive the client key for publishing statistics and reset it to -1,
s/Retreive/Retrieve/
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
This variable is never accessed before getting re-assigned, so this assignment
in the variable definition can be removed per project style.
+ InitMaterializedSRF(fcinfo, 0);
Can this initialization be postponed till when we know the ResultSetInfo is
needed? While a micro optimization, it seems we can avoid that overhead in
case the query errors out?
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params, &found);
Nitpick, but there are a few oversize lines, like this one, which need to be
wrapped to match project style.
+ /*
+ * XXX. If the process exits without cleaning up its slot, i.e in case of
+ * an abrupt crash the client_keys slot won't be reset thus resulting in
+ * false negative and WARNING would be thrown in case another process with
+ * same slot index is queried for statistics.
+ */
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request", pid));
+ PG_RETURN_NULL();
+ }
AFAICT this mean that a failure to clean up (through a crash for example) can
block a future backend from reporting which isn't entirely ideal. Is there
anything we can do to mitigate this?
+ bool summary = false;
In ProcessGetMemoryContextInterrupt(), can't we just read entry->summary rather
than define a local variable and assign it? We already read lots of other
fields from entry directly so it seems more readable to be consistent.
+ /*
+ * Add the count of children contexts which are traversed
+ */
+ *num_contexts = *num_contexts + ichild;
Isn't this really the number of children + the parent context? ichild starts
at one to (AIUI) include the parent context. Also, MemoryContextStatsCounter
should also make sure to set num_contexts to zero before adding to it.
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry))
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE
I don't think MAX_MEMORY_CONTEXT_STATS_SIZE adds any value as it's only used
once, on the line directly after its definition. We can just use the expansion
of ((sizeof(MemoryStatsEntry)) when defining MAX_MEMORY_CONTEXT_STATS_NUM.
The test module patch is unchanged.
Please include all commits in the series even if they aren't updated since the
CFBot cannot pick them up otherwise.
--
Daniel Gustafsson
Hi Daniel,
Thank you for your comments. Please find attached v41 with all the comments
addressed.
+#include "access/twophase.h" +#include "catalog/pg_authid_d.h" ... +#include "utils/acl.h" Are these actually required to be included?
Removed these.
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
Why is this needed? MemoryStatsContextId is identical to MemoryContextId
and
is too only used in mcxtfuncs.c so there is no need to expose it in
memutils.h.
Can't you just use MemoryContextId everywhere or am I missing something?
MemoryContextId has been renamed to MemoryStatsContextId for better
code readability. I removed the leftover MemoryContextId definition.
Also, I moved it out of memutils.h. Did the same with some other structures
and definitions which were only used in mcxtfuncs.c
+#define CLIENT_KEY_SIZE 64 + +static LWLock *client_keys_lock = NULL; +static int *client_keys = NULL; +static dshash_table *MemoryStatsDsHash = NULL; +static dsa_area *MemoryStatsDsaArea = NULL; These new additions have in some cases too generic names (client_keys etc) and they all lack comments explaining why they're needed. Maybe a leading block comment explaining they are used for process memory context reporting, and then inline comments on each with their use?
Added comments.
+#define CLIENT_KEY_SIZE 64 ... + char key[CLIENT_KEY_SIZE]; ... + snprintf(key, sizeof(key), "%d", MyProcNumber); Given that MyProcNumber is an index into the proc array, it seems excessive to use 64 bytes to store it, can't we get away with a small stack allocation?
I agree. Defined it as 32 bytes as MyProcNumber is of size uint32. Kindly
let me know if you think it can be reduced further.
+ * Retreive the client key for publishing statistics and reset it to -1,
s/Retreive/Retrieve/
Fixed.
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
This variable is never accessed before getting re-assigned, so this
assignment
in the variable definition can be removed per project style.
Fixed too.
+ InitMaterializedSRF(fcinfo, 0);
Can this initialization be postponed till when we know the ResultSetInfo is
needed? While a micro optimization, it seems we can avoid that overhead in
case the query errors out?
Good point. Added this just before the result set is getting populated.
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash =
GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params,
&found);
Nitpick, but there are a few oversize lines, like this one, which need to
be
wrapped to match project style.
I have edited this accordingly.
+ /* + * XXX. If the process exits without cleaning up its slot, i.e in case of + * an abrupt crash the client_keys slot won't be reset thus resulting in + * false negative and WARNING would be thrown in case another process with + * same slot index is queried for statistics. + */ + if (client_keys[procNumber] == -1) + client_keys[procNumber] = MyProcNumber; + else + { + LWLockRelease(client_keys_lock); + ereport(WARNING, + errmsg("server process %d is processing previous request", pid)); + PG_RETURN_NULL(); + } AFAICT this mean that a failure to clean up (through a crash for example) can block a future backend from reporting which isn't entirely ideal. Is there anything we can do to mitigate this?
Yes, we can reset it when the client times out, as long as we verify that
the value corresponds
to our ProcNumber and not another client's request. Fixed accordingly.
+ bool summary = false;
In ProcessGetMemoryContextInterrupt(), can't we just read entry->summary
rather
than define a local variable and assign it? We already read lots of other
fields from entry directly so it seems more readable to be consistent.
Fixed.
+ /* + * Add the count of children contexts which are traversed + */ + *num_contexts = *num_contexts + ichild; Isn't this really the number of children + the parent context? ichild starts at one to (AIUI) include the parent context. Also, MemoryContextStatsCounter should also make sure to set num_contexts to zero before adding to it.
Yes. Adjusted the comment to match this and set num_contexts to zero.
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry)) +#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE I don't think MAX_MEMORY_CONTEXT_STATS_SIZE adds any value as it's only used once, on the line directly after its definition. We can just use the expansion of ((sizeof(MemoryStatsEntry)) when defining MAX_MEMORY_CONTEXT_STATS_NUM.
Fixed.
I've attached the test patch as is, I will clean it up and do further
improvements to it.
Thank you,
Rahila Syed
Attachments:
v41-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v41-0001-Add-function-to-report-memory-context-statistics.patchDownload
From f37252238a63e4023a06beac1b6e3dd0887fffcf Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Mon, 24 Nov 2025 17:05:47 +0530
Subject: [PATCH] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 157 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 1013 ++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 31 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 11 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 4 +-
25 files changed, 1285 insertions(+), 30 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..a5c66837241 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,130 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +426,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 95ad29a64b9..6fe67e950bf 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -692,6 +692,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1c38488f2cb..561d88ebb4d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index ce6b5299324..fdd385e492d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index c4a888a081c..00f03b36ed8 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b23d0c19360..a5ed58a18c5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -140,6 +141,7 @@ CalculateShmemSize(void)
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -328,6 +330,7 @@ CreateOrAttachShmemStructs(void)
InjectionPointShmemInit();
AioShmemInit();
WaitLSNShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 1504fafe6d8..c5e69151756 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..e726f40dfbb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..644d8d988e1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -162,6 +162,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -404,6 +405,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..8e0e3116f1f 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,138 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum number of memory context statistics is calculated by dividing
+ * max memory allocated per backend with maximum size per context statistics.
+ * The identifier and name are statically allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
+
+/* Size of dshash key */
+#define CLIENT_KEY_SIZE 32
+
+/* Dynamic shared memory state for reporting statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * These are used for reporting memory context
+ * statistics of a process.
+ */
+
+/* Lock to control access to client_keys array */
+static LWLock *client_keys_lock = NULL;
+
+/* Array to store the keys of MemoryStatsDsHash */
+static int *client_keys = NULL;
+
+/*
+ * Table to store pointers to dsa memory containing
+ * memory statistics and other meta data. There is one
+ * entry per client backend request, keyed by ProcNumber of
+ * the client obtained from client_keys array above.
+ */
+static dshash_table *MemoryStatsDsHash = NULL;
+
+/*
+ * Dsa area which stores the actual memory context
+ * statistics.
+ */
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static void memstats_client_key_reset(int ProcNumber);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
- * MemoryContextId
+ * MemoryStatsContextId
* Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
+ * pg_get_backend_memory_contexts and the likes.
*/
-typedef struct MemoryContextId
+typedef struct MemoryStatsContextId
{
MemoryContext context;
int context_id;
-} MemoryContextId;
+} MemoryStatsContextId;
/*
* int_list_to_array
@@ -89,7 +199,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +253,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +268,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +314,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +341,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +349,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +430,837 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give up
+ * and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is a warning because we don't want to break loops.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request", pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m", pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000), WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum, path_length, INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+
+/*
+ * Remove this process from the publishing process'
+ * client key slot, if the stats publishing process has failed to do so.
+ */
+static void
+memstats_client_key_reset(int procNumber)
+{
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == MyProcNumber)
+ client_keys[procNumber] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Retrieve the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-crash", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Entry has been deleted due to client process exit. Make sure that the
+ * client always deletes the entry after taking required lock or this
+ * function may end up writing to unallocated memory.
+ */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /*
+ * The client has timed out waiting for us to write statistics and is
+ * requesting statistics from some other process
+ */
+ if (entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (entry->summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &c,
+ HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id -
+ num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting backends and return */
+ ConditionVariableBroadcast(&entry->memcxt_cv);
+}
+
+/*
+ * Clean up before exit from ProcessGetMemoryContextInterrupt
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry,
+ MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+/* Used for testing purposes */
+dsa_area *
+pg_get_memstats_dsa_area(void)
+{
+ if (MemoryStatsDsaArea != NULL)
+ return MemoryStatsDsaArea;
+ else
+ return NULL;
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..202403ebc63 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -658,6 +658,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..31c4de9f0b4 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1008,6 +1008,37 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1edb18958f7..5e532b6df21 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..b76f24baed6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 5b0ce383408..613e769c84e 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -136,3 +136,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..70df562c6dd 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,7 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "utils/dsa.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,7 +48,6 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
-
/*
* Standard top-level memory contexts.
*
@@ -149,6 +148,7 @@ extern MemoryContext BumpContextCreate(MemoryContext parent,
Size minContextSize,
Size initBlockSize,
Size maxBlockSize);
+extern dsa_area *pg_get_memstats_dsa_area(void);
/*
* Recommended default alloc parameters, suitable for "ordinary" contexts
@@ -319,4 +319,11 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 3b37fafa65b..21c65ad2d10 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57a8f0366a5..9bd241af59d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1684,9 +1684,11 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
-MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v2-0002-Test-module-to-test-memory-context-reporting-with-in (1).patchapplication/octet-stream; name="v2-0002-Test-module-to-test-memory-context-reporting-with-in (1).patch"Download
From e21832c610574f48496c961a7318070b720568de Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 23 Oct 2025 18:01:36 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 32 +++++
.../t/001_memcontext_inj.pl | 58 +++++++++
.../test_memcontext_reporting--1.0.sql | 11 ++
.../test_memcontext_reporting.c | 123 ++++++++++++++++++
.../test_memcontext_reporting.control | 4 +
6 files changed, 229 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 902a7954101..a31a2578c18 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..01a7baa0263
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,32 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+EXTENSION = test_memcontext_reporting
+DATA = test_memcontext_reporting--1.0.sql
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..69d8489eb37
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,58 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 1);
+# max_connections need to be bumped in order to accommodate for pgbench clients
+# and log_statement is dialled down since it otherwise will generate enormous
+# amounts of logging. Page verification failures are still logged.
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION test_memcontext_reporting;');
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+# Attaching to a client process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-client-crash', 'error');");
+
+my $pid = $node->safe_psql('postgres', "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+print "PID";
+print $pid;
+
+#Client should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+like ( $psql_err, qr/error triggered for injection point memcontext-client-crash/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-client-crash');");
+my $topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-server-crash', 'error');");
+
+#Server should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-server-crash');");
+$topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
new file mode 100644
index 00000000000..181daf429d0
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
@@ -0,0 +1,11 @@
+CREATE FUNCTION memcontext_crash_server()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION memcontext_crash_client()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION dsa_dump_sql()
+RETURNS bigint
+AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..955155524c2
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,123 @@
+/*
+ * -------------------------------------------------------------------------
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "utils/injection_point.h"
+#include "funcapi.h"
+#include "utils/injection_point.h"
+#include "storage/dsm_registry.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
+
+/*
+ * memcontext_crash_client
+ *
+ * Ensure that the client process aborts in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_client);
+Datum
+memcontext_crash_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-client-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(memcontext_detach_client);
+Datum
+memcontext_detach_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-client-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_crash_server
+ *
+ * Ensure that the server process crashes in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_server);
+Datum
+memcontext_crash_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-server-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_detach_server
+ *
+ * Detach the injection point which crashes the server
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_detach_server);
+Datum
+memcontext_detach_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-server-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * dsa_dump_sql
+ */
+PG_FUNCTION_INFO_V1(dsa_dump_sql);
+Datum
+dsa_dump_sql(PG_FUNCTION_ARGS)
+{
+ bool found;
+ size_t tot_size;
+ dsa_area *memstats_dsa_area;
+
+ memstats_dsa_area = pg_get_memstats_dsa_area();
+
+ if (memstats_dsa_area == NULL)
+ memstats_dsa_area = GetNamedDSA("memory_context_statistics_dsa", &found);
+
+ tot_size = dsa_get_total_size(memstats_dsa_area);
+ dsa_detach(memstats_dsa_area);
+ PG_RETURN_INT64(tot_size);
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
Hi,
I'm attaching the updated patches, which primarily include cleanup and have
been rebased
following the CFbot report.
Thank you,
Rahila Syed
On Tue, Nov 25, 2025 at 12:50 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
Show quoted text
Hi Daniel,
Thank you for your comments. Please find attached v41 with all the
comments addressed.+#include "access/twophase.h" +#include "catalog/pg_authid_d.h" ... +#include "utils/acl.h" Are these actually required to be included?Removed these.
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
Why is this needed? MemoryStatsContextId is identical to MemoryContextId
and
is too only used in mcxtfuncs.c so there is no need to expose it in
memutils.h.
Can't you just use MemoryContextId everywhere or am I missing something?MemoryContextId has been renamed to MemoryStatsContextId for better
code readability. I removed the leftover MemoryContextId definition.
Also, I moved it out of memutils.h. Did the same with some other structures
and definitions which were only used in mcxtfuncs.c+#define CLIENT_KEY_SIZE 64 + +static LWLock *client_keys_lock = NULL; +static int *client_keys = NULL; +static dshash_table *MemoryStatsDsHash = NULL; +static dsa_area *MemoryStatsDsaArea = NULL; These new additions have in some cases too generic names (client_keys etc) and they all lack comments explaining why they're needed. Maybe a leading block comment explaining they are used for process memory context reporting, and then inline comments on each with their use?Added comments.
+#define CLIENT_KEY_SIZE 64 ... + char key[CLIENT_KEY_SIZE]; ... + snprintf(key, sizeof(key), "%d", MyProcNumber); Given that MyProcNumber is an index into the proc array, it seems excessive to use 64 bytes to store it, can't we get away with a small stack allocation?I agree. Defined it as 32 bytes as MyProcNumber is of size uint32. Kindly
let me know if you think it can be reduced further.+ * Retreive the client key for publishing statistics and reset it to
-1,
s/Retreive/Retrieve/Fixed.
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
This variable is never accessed before getting re-assigned, so this
assignment
in the variable definition can be removed per project style.Fixed too.
+ InitMaterializedSRF(fcinfo, 0);
Can this initialization be postponed till when we know the ResultSetInfo
is
needed? While a micro optimization, it seems we can avoid that overhead
in
case the query errors out?Good point. Added this just before the result set is getting populated.
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash =
GetNamedDSHash("memory_context_statistics_dshash", &memctx_dsh_params,
&found);
Nitpick, but there are a few oversize lines, like this one, which need to
be
wrapped to match project style.I have edited this accordingly.
+ /* + * XXX. If the process exits without cleaning up its slot, i.e in case of + * an abrupt crash the client_keys slot won't be reset thus resulting in + * false negative and WARNING would be thrown in case another process with + * same slot index is queried for statistics. + */ + if (client_keys[procNumber] == -1) + client_keys[procNumber] = MyProcNumber; + else + { + LWLockRelease(client_keys_lock); + ereport(WARNING, + errmsg("server process %d is processing previous request", pid)); + PG_RETURN_NULL(); + } AFAICT this mean that a failure to clean up (through a crash for example) can block a future backend from reporting which isn't entirely ideal. Is there anything we can do to mitigate this?Yes, we can reset it when the client times out, as long as we verify that
the value corresponds
to our ProcNumber and not another client's request. Fixed accordingly.+ bool summary = false;
In ProcessGetMemoryContextInterrupt(), can't we just read entry->summary
rather
than define a local variable and assign it? We already read lots of other
fields from entry directly so it seems more readable to be consistent.Fixed.
+ /* + * Add the count of children contexts which are traversed + */ + *num_contexts = *num_contexts + ichild; Isn't this really the number of children + the parent context? ichild starts at one to (AIUI) include the parent context. Also, MemoryContextStatsCounter should also make sure to set num_contexts to zero before adding to it.Yes. Adjusted the comment to match this and set num_contexts to zero.
+#define MAX_MEMORY_CONTEXT_STATS_SIZE (sizeof(MemoryStatsEntry)) +#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / MAX_MEMORY_CONTEXT_STATS_SIZE I don't think MAX_MEMORY_CONTEXT_STATS_SIZE adds any value as it's only used once, on the line directly after its definition. We can just use the expansion of ((sizeof(MemoryStatsEntry)) when defining MAX_MEMORY_CONTEXT_STATS_NUM.Fixed.
I've attached the test patch as is, I will clean it up and do further
improvements to it.Thank you,
Rahila Syed
Attachments:
v42-0002-Test-module-to-test-memory-context-reporting-with-in.patchapplication/octet-stream; name=v42-0002-Test-module-to-test-memory-context-reporting-with-in.patchDownload
From d11c41bf3055a3a8a4c109e49a87a49ffaeaaa88 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Fri, 28 Nov 2025 14:46:38 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 32 ++++++
.../t/001_memcontext_inj.pl | 58 ++++++++++
.../test_memcontext_reporting--1.0.sql | 7 ++
.../test_memcontext_reporting.c | 102 ++++++++++++++++++
.../test_memcontext_reporting.control | 4 +
6 files changed, 204 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index d079b91b1a2..1ed0cdc66b3 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..01a7baa0263
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,32 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+EXTENSION = test_memcontext_reporting
+DATA = test_memcontext_reporting--1.0.sql
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..69d8489eb37
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,58 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 1);
+# max_connections need to be bumped in order to accommodate for pgbench clients
+# and log_statement is dialled down since it otherwise will generate enormous
+# amounts of logging. Page verification failures are still logged.
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION test_memcontext_reporting;');
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+# Attaching to a client process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-client-crash', 'error');");
+
+my $pid = $node->safe_psql('postgres', "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+print "PID";
+print $pid;
+
+#Client should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+like ( $psql_err, qr/error triggered for injection point memcontext-client-crash/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-client-crash');");
+my $topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-server-crash', 'error');");
+
+#Server should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-server-crash');");
+$topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
new file mode 100644
index 00000000000..4f787cf789e
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
@@ -0,0 +1,7 @@
+CREATE FUNCTION memcontext_crash_server()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION memcontext_crash_client()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..774ae7df49d
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,102 @@
+/*
+ * -------------------------------------------------------------------------
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "utils/injection_point.h"
+#include "funcapi.h"
+#include "utils/injection_point.h"
+#include "storage/dsm_registry.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
+
+/*
+ * memcontext_crash_client
+ *
+ * Ensure that the client process aborts in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_client);
+Datum
+memcontext_crash_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-client-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(memcontext_detach_client);
+Datum
+memcontext_detach_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-client-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_crash_server
+ *
+ * Ensure that the server process crashes in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_server);
+Datum
+memcontext_crash_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-server-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_detach_server
+ *
+ * Detach the injection point which crashes the server
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_detach_server);
+Datum
+memcontext_detach_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-server-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
v42-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v42-0001-Add-function-to-report-memory-context-statistics.patchDownload
From a929a519d994a4ced1fd69ce64dd83d5bfc8ff19 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 27 Nov 2025 14:39:43 +0530
Subject: [PATCH] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 157 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 1011 ++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 31 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 10 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 4 +-
25 files changed, 1282 insertions(+), 30 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..a5c66837241 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,130 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +426,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6fffdb9398e..47c5422f4ad 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -692,6 +692,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1c38488f2cb..561d88ebb4d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index ce6b5299324..fdd385e492d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index c4a888a081c..00f03b36ed8 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b23d0c19360..a5ed58a18c5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -140,6 +141,7 @@ CalculateShmemSize(void)
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -328,6 +330,7 @@ CreateOrAttachShmemStructs(void)
InjectionPointShmemInit();
AioShmemInit();
WaitLSNShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 1504fafe6d8..c5e69151756 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..e726f40dfbb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..644d8d988e1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -162,6 +162,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -404,6 +405,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..c661eef7ae9 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,138 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum number of memory context statistics is calculated by dividing
+ * max memory allocated per backend with maximum size per context statistics.
+ * The identifier and name are statically allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
+
+/* Size of dshash key */
+#define CLIENT_KEY_SIZE 32
+
+/* Dynamic shared memory state for reporting statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * These are used for reporting memory context
+ * statistics of a process.
+ */
+
+/* Lock to control access to client_keys array */
+static LWLock *client_keys_lock = NULL;
+
+/* Array to store the keys of MemoryStatsDsHash */
+static int *client_keys = NULL;
+
+/*
+ * Table to store pointers to dsa memory containing
+ * memory statistics and other meta data. There is one
+ * entry per client backend request, keyed by ProcNumber of
+ * the client obtained from client_keys array above.
+ */
+static dshash_table *MemoryStatsDsHash = NULL;
+
+/*
+ * Dsa area which stores the actual memory context
+ * statistics.
+ */
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static void memstats_client_key_reset(int ProcNumber);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
- * MemoryContextId
+ * MemoryStatsContextId
* Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
+ * pg_get_backend_memory_contexts and the likes.
*/
-typedef struct MemoryContextId
+typedef struct MemoryStatsContextId
{
MemoryContext context;
int context_id;
-} MemoryContextId;
+} MemoryStatsContextId;
/*
* int_list_to_array
@@ -89,7 +199,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +253,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +268,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +314,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +341,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +349,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +430,835 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give up
+ * and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is a warning because we don't want to break loops.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m",
+ pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum,
+ path_length,
+ INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+
+/*
+ * Remove this process from the publishing process'
+ * client key slot, if the stats publishing process has failed to do so.
+ */
+static void
+memstats_client_key_reset(int procNumber)
+{
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == MyProcNumber)
+ client_keys[procNumber] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Retrieve the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-crash", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Entry has been deleted due to client process exit. Make sure that the
+ * client always deletes the entry after taking required lock or this
+ * function may end up writing to unallocated memory.
+ */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /*
+ * The client has timed out waiting for us to write statistics and is
+ * requesting statistics from some other process
+ */
+ if (entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (entry->summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
+ &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
+ &c, HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
+ &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id
+ - num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+}
+
+/*
+ * Clean up before exit from ProcessGetMemoryContextInterrupt
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry,
+ MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..202403ebc63 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -658,6 +658,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..31c4de9f0b4 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1008,6 +1008,37 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66431940700..8af5c016365 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..b76f24baed6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 5b0ce383408..613e769c84e 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -136,3 +136,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..4296667cbf0 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,7 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "utils/dsa.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,7 +48,6 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
-
/*
* Standard top-level memory contexts.
*
@@ -319,4 +318,11 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 3b37fafa65b..21c65ad2d10 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3c3523b5b2..d9b45bdd721 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1686,9 +1686,11 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
-MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 2025-11-28 18:22, Rahila Syed wrote:
Hi,
I'm attaching the updated patches, which primarily include cleanup and
have been rebased
following the CFbot report.
Thanks for updating the patch!
I observed an assertion failure when forcing a timeout as follows:
```
$ psql
(pid:38587)=#
$ kill -s SIGSTOP 38587
$ psql
(pid:38618) =# select * from pg_get_process_memory_contexts(38587,
false);
name | ident | type | level | path | total_bytes |
total_nblocks | free_bytes | free_chunks | used_bytes | num_agg_contexts
--------+--------+--------+--------+--------+-------------+---------------+------------+-------------+------------+------------------
[NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] |
[NULL] | [NULL] | [NULL] | [NULL] | [NULL]
(1 row)
Time: 5013.515 ms (00:05.014)
$ kill -s SIGCONT 38587
$ tail postgresql.log
TRAP: failed Assert("client_keys[MyProcNumber] != -1"), File:
"mcxtfuncs.c", Line: 881, PID: 38587
0 postgres 0x0000000104943400
ExceptionalCondition + 216
1 postgres 0x000000010480f738
ProcessGetMemoryContextInterrupt + 140
2 postgres 0x00000001046f2710
ProcessInterrupts + 3008
3 postgres 0x00000001046f1a78
ProcessClientReadInterrupt + 80
4 postgres 0x0000000104433994 secure_read
+ 404
5 postgres 0x00000001044411dc pq_recvbuf
+ 260
6 postgres 0x0000000104441088 pq_getbyte
+ 96
7 postgres 0x00000001046fa0fc
SocketBackend + 44
8 postgres 0x00000001046f6d3c ReadCommand
+ 44
9 postgres 0x00000001046f6284
PostgresMain + 2900
10 postgres 0x00000001046ed558
BackendInitialize + 0
11 postgres 0x00000001045c0a48
postmaster_child_launch + 456
12 postgres 0x00000001045c8520
BackendStartup + 304
13 postgres 0x00000001045c636c ServerLoop
+ 372
14 postgres 0x00000001045c4e24
PostmasterMain + 6448
15 postgres 0x0000000104445b4c main + 924
16 dyld 0x0000000188662b98 start +
6076
2025-12-08 07:35:32.608 JST [38540] LOG: 00000: client backend (PID
38587) was terminated by signal 6: Abort trap: 6
2025-12-08 07:35:32.608 JST [38540] LOCATION: LogChildExit,
postmaster.c:2872
```
Below are comments regarding the v42-0001 patch:
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned.
It might be good to also document in func-admin.sgml that the function
times out after 5 seconds when the target backend does not respond, and
that in such a case NULLs are returned.
+ * If DSA exists, created by another process requesting statistics,
attach
+ * to it. We expect the client process to create required DSA and
Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea =
GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash =
GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
From the comment, it sounded to me as if the client executing
pg_get_process_memory_contexts() might not create the DSA in some cases.
Is it correct to assume that such a situation can happen?
In [1]/messages/by-id/CAH2L28sc-rEhyntPLoaC2XUa0ZjS5ka6KzEbuSVxQBBnUYu1KQ@mail.gmail.com, as a response to concerns about using DSA inside a CFI handler,
you wrote that “all the dynamic shared memory needed to store the
statistics is created and deleted in the client function”.
So I understood that it would never create the DSA inside the CFI
handler.
If that understanding is correct, perhaps the comment should be reworded
to make that clear.
+ context_id_lookup =
hash_create("pg_get_remote_backend_memory_contexts",
This appears to use the old function name. Should this be updated to
"pg_get_process_memory_contexts" instead?
[1]: /messages/by-id/CAH2L28sc-rEhyntPLoaC2XUa0ZjS5ka6KzEbuSVxQBBnUYu1KQ@mail.gmail.com
/messages/by-id/CAH2L28sc-rEhyntPLoaC2XUa0ZjS5ka6KzEbuSVxQBBnUYu1KQ@mail.gmail.com
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On 28 Nov 2025, at 10:22, Rahila Syed <rahilasyed90@gmail.com> wrote:
Hi,
I'm attaching the updated patches, which primarily include cleanup and have been rebased
following the CFbot report.
Thanks for the patch, below are a few comments and suggestions. As I was
reviewing I tweaked the below and have attached the comments as changes in
0003.
== in func-admin.sgml
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type> )
We recently simplified the UI by removing the timeout, and the more I think
about it the more I am convinced that there is more simplification to be had.
The most likely usage pattern, IMO, will be to get all the contexts and not the
summary, so we can make the summary parameter DEFAULT to false. This allows
most uses to just pass the pid, without complicating the code at all.
== in mcxtfuncs.c
+/* Size of dshash key */
+#define CLIENT_KEY_SIZE 32
That's still pretty generous isn't it? We are printing a uint32 into it so the
highest number it can reach is ~4 billion (which while the upper limit, is
quite theoretic in this case). 10 + 1 bytes should suffice to store right?
- * MemoryContextId
+ * MemoryStatsContextId
Sorry, but I still don't agree with this rename and I think we should skip it,
if only to avoid changes to existing parts of the code.
+ * Entry has been deleted due to client process exit. Make sure that the
+ * client always deletes the entry after taking required lock or this
+ * function may end up writing to unallocated memory.
Can you explain this a bit further, I'm not sure I get it. The code goes on to
release a lock immediately and then destroys the hash. Who is responsible for
destroying the entry?
== In system-views.sql
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
This is not a view, and the functions aren't used to drive a view, so these
should not be defined here. The above mentioned change to add DEFAULT handling
to the summary parameter fixes this in the attached.
== In ProcessGetMemoryContextInterrupt()
I'm not a fan of having to exit's from the function doing duplicative cleanups,
in the attached I've wrapped them in a conditional to just have one exit path.
What do you think about that?
== In PublishMemoryContext()
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
Looking at this more I don't really like that resetting the memory context is
done via a separate function, when that function must be called from the exact
right place to ensure CurrentMemoryContext is what it thinks it is. It's all a
bit too magic. Since this is only called in 3 places I would prefer to inline
the code in PublishMemoryContext().
== In PublishMemoryContext()
+ const char *ident = context->ident;
+ const char *name = context->name;
ident and name are defined as const, but they are later assigned to after the
initial assignment. I think we need to unconstify these.
== In PublishMemoryContext()
+ if (strlen(name) >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
We already have namelen which is set to exactly strlen(name), so let's reuse
that for readability.
== In PublishMemoryContext()
+ /* Allocate DSA memory for storing path information */
This comment is no longer accurate is it? The DSA has already been allocated
at this point.
== In memutils.h
+#include "utils/dsa.h"
This is not needed.
I also did some smaller comment rewording and reflowing, some smaller cleanups
and a fresh pgindent/pgperltidy run. The attached 0003 contains the above.
--
Daniel Gustafsson
Attachments:
v43-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v43-0001-Add-function-to-report-memory-context-statistics.patch; x-unix-mode=0644Download
From 8f50ab607db07dc125fda2831e1d234c2a630ccb Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 27 Nov 2025 14:39:43 +0530
Subject: [PATCH v43 1/3] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 157 +++
src/backend/catalog/system_views.sql | 5 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 1011 ++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 31 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 10 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 4 +-
25 files changed, 1282 insertions(+), 30 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..a5c66837241 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,130 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal>,
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +426,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 086c4c8fb6f..17ec512622b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -692,6 +692,11 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION
+ pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1c38488f2cb..561d88ebb4d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e84e8663e96..5b3e08805bf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index ce6b5299324..fdd385e492d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index c4a888a081c..00f03b36ed8 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b23d0c19360..a5ed58a18c5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -140,6 +141,7 @@ CalculateShmemSize(void)
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -328,6 +330,7 @@ CreateOrAttachShmemStructs(void)
InjectionPointShmemInit();
AioShmemInit();
WaitLSNShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ebc3f4ca457..27b3b51cf2d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..e726f40dfbb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c1ac71ff7f2..644d8d988e1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -162,6 +162,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -404,6 +405,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index fe6dce9cba3..c661eef7ae9 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,28 +17,138 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum number of memory context statistics is calculated by dividing
+ * max memory allocated per backend with maximum size per context statistics.
+ * The identifier and name are statically allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
+
+/* Size of dshash key */
+#define CLIENT_KEY_SIZE 32
+
+/* Dynamic shared memory state for reporting statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * These are used for reporting memory context
+ * statistics of a process.
+ */
+
+/* Lock to control access to client_keys array */
+static LWLock *client_keys_lock = NULL;
+
+/* Array to store the keys of MemoryStatsDsHash */
+static int *client_keys = NULL;
+
+/*
+ * Table to store pointers to dsa memory containing
+ * memory statistics and other meta data. There is one
+ * entry per client backend request, keyed by ProcNumber of
+ * the client obtained from client_keys array above.
+ */
+static dshash_table *MemoryStatsDsHash = NULL;
+
+/*
+ * Dsa area which stores the actual memory context
+ * statistics.
+ */
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static void memstats_client_key_reset(int ProcNumber);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
+static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
+ HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
- * MemoryContextId
+ * MemoryStatsContextId
* Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
+ * pg_get_backend_memory_contexts and the likes.
*/
-typedef struct MemoryContextId
+typedef struct MemoryStatsContextId
{
MemoryContext context;
int context_id;
-} MemoryContextId;
+} MemoryStatsContextId;
/*
* int_list_to_array
@@ -89,7 +199,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -143,24 +253,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +268,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -189,7 +314,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryContextId);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -216,7 +341,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryContextId *entry;
+ MemoryStatsContextId *entry;
bool found;
/*
@@ -224,8 +349,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -305,3 +430,835 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. This is because allowing
+ * any users to issue this request at an unbounded rate would cause lots of
+ * requests to be sent, which can lead to denial of service. Additional roles
+ * can be permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give up
+ * and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is a warning because we don't want to break loops.
+ */
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send dsa pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ /*
+ * The dsa pointers containing statistics for each client are stored in a
+ * dshash table. In addition to dsa pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m",
+ pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ INJECTION_POINT("memcontext-client-crash", NULL);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum,
+ path_length,
+ INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+
+/*
+ * Remove this process from the publishing process'
+ * client key slot, if the stats publishing process has failed to do so.
+ */
+static void
+memstats_client_key_reset(int procNumber)
+{
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == MyProcNumber)
+ client_keys[procNumber] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in
+ * per-process DSA pointers. These pointers are stored in a dshash table with
+ * key as requesting clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts
+ * statistics. For that we traverse the memory context tree recursively in
+ * depth first search manner to cover all the children of a parent context, to
+ * be able to display a cumulative total of memory consumption by a parent at
+ * level 2 and all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Retrieve the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ Assert(client_keys[MyProcNumber] != -1);
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL, "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * If DSA exists, created by another process requesting statistics, attach
+ * to it. We expect the client process to create required DSA and Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-crash", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Entry has been deleted due to client process exit. Make sure that the
+ * client always deletes the entry after taking required lock or this
+ * function may end up writing to unallocated memory.
+ */
+ if (!found)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /*
+ * The client has timed out waiting for us to write statistics and is
+ * requesting statistics from some other process
+ */
+ if (entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ return;
+ }
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (entry->summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
+ &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
+ &c, HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+ return;
+ }
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryStatsContextId *contextid_entry;
+
+ contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
+ &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of its
+ * ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id is
+ * greater than max_stats, we stop reporting individual statistics
+ * when context_id equals max_stats - 2. As we use max_stats - 1 array
+ * slot for reporting cumulative statistics or "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Statistics are not aggregated, i.e individual statistics reported when
+ * context_id <= max_stats.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+ /* Report number of aggregated memory contexts */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id
+ - num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for cumulative
+ * statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ entry->stats_written = true;
+ end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+}
+
+/*
+ * Clean up before exit from ProcessGetMemoryContextInterrupt
+ */
+static void
+end_memorycontext_reporting(MemoryStatsDSHashEntry *entry,
+ MemoryContext oldcontext, HTAB *context_id_lookup)
+{
+ MemoryContext curr_ctx = CurrentMemoryContext;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(curr_ctx);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryStatsContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ const char *ident = context->ident;
+ const char *name = context->name;
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = context->ident;
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (strlen(name) >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Allocate DSA memory for storing path information */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..202403ebc63 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -658,6 +658,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..31c4de9f0b4 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1008,6 +1008,37 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66af2d96d67..5713bbb1550 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..b76f24baed6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 5b0ce383408..613e769c84e 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -136,3 +136,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..4296667cbf0 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,7 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-
+#include "utils/dsa.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,7 +48,6 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
-
/*
* Standard top-level memory contexts.
*
@@ -319,4 +318,11 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 3b37fafa65b..21c65ad2d10 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cf3f6a7dafd..aa898f025c0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1686,9 +1686,11 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
-MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsContextId
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.39.3 (Apple Git-146)
v43-0002-Test-module-to-test-memory-context-reporting-wit.patchapplication/octet-stream; name=v43-0002-Test-module-to-test-memory-context-reporting-wit.patch; x-unix-mode=0644Download
From 00c38517db21893543f0af5b0c2f4eccc0a23658 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Fri, 28 Nov 2025 14:46:38 +0530
Subject: [PATCH v43 2/3] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 32 ++++++
.../t/001_memcontext_inj.pl | 58 ++++++++++
.../test_memcontext_reporting--1.0.sql | 7 ++
.../test_memcontext_reporting.c | 102 ++++++++++++++++++
.../test_memcontext_reporting.control | 4 +
6 files changed, 204 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index d079b91b1a2..1ed0cdc66b3 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..01a7baa0263
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,32 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+EXTENSION = test_memcontext_reporting
+DATA = test_memcontext_reporting--1.0.sql
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..69d8489eb37
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,58 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 1);
+# max_connections need to be bumped in order to accommodate for pgbench clients
+# and log_statement is dialled down since it otherwise will generate enormous
+# amounts of logging. Page verification failures are still logged.
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION test_memcontext_reporting;');
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+# Attaching to a client process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-client-crash', 'error');");
+
+my $pid = $node->safe_psql('postgres', "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+print "PID";
+print $pid;
+
+#Client should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+like ( $psql_err, qr/error triggered for injection point memcontext-client-crash/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-client-crash');");
+my $topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres', "select injection_points_attach('memcontext-server-crash', 'error');");
+
+#Server should have thrown error
+$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres', "select injection_points_detach('memcontext-server-crash');");
+$topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+ok($topcontext_name = 'TopMemoryContext');
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
new file mode 100644
index 00000000000..4f787cf789e
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting--1.0.sql
@@ -0,0 +1,7 @@
+CREATE FUNCTION memcontext_crash_server()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION memcontext_crash_client()
+RETURNS pg_catalog.void
+AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..774ae7df49d
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,102 @@
+/*
+ * -------------------------------------------------------------------------
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "utils/injection_point.h"
+#include "funcapi.h"
+#include "utils/injection_point.h"
+#include "storage/dsm_registry.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
+
+/*
+ * memcontext_crash_client
+ *
+ * Ensure that the client process aborts in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_client);
+Datum
+memcontext_crash_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-client-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(memcontext_detach_client);
+Datum
+memcontext_detach_client(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-client-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_crash_server
+ *
+ * Ensure that the server process crashes in between memory context
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_crash_server);
+Datum
+memcontext_crash_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointAttach("memcontext-server-crash",
+ "test_memcontext_reporting", "crash", NULL, 0);
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
+
+/*
+ * memcontext_detach_server
+ *
+ * Detach the injection point which crashes the server
+ * reporting.
+ */
+PG_FUNCTION_INFO_V1(memcontext_detach_server);
+Datum
+memcontext_detach_server(PG_FUNCTION_ARGS)
+{
+#ifdef USE_INJECTION_POINTS
+ InjectionPointDetach("memcontext-server-crash");
+
+#else
+ elog(ERROR,
+ "test is not working as intended when injection points are disabled");
+#endif
+ PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.39.3 (Apple Git-146)
v43-0003-Review-comments.patchapplication/octet-stream; name=v43-0003-Review-comments.patch; x-unix-mode=0644Download
From 257b7d87aaa55c415e205afad9ba6ae90e8d6195 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 8 Dec 2025 10:38:43 +0100
Subject: [PATCH v43 3/3] Review comments
---
doc/src/sgml/func/func-admin.sgml | 4 +-
src/backend/catalog/system_functions.sql | 14 +
src/backend/catalog/system_views.sql | 5 -
src/backend/utils/adt/mcxtfuncs.c | 314 +++++++++---------
src/include/utils/memutils.h | 2 +-
.../t/001_memcontext_inj.pl | 42 ++-
src/tools/pgindent/typedefs.list | 2 +-
7 files changed, 195 insertions(+), 188 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index a5c66837241..3e71ced60a2 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -257,7 +257,7 @@
<indexterm>
<primary>pg_get_process_memory_contexts</primary>
</indexterm>
- <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type>, <parameter>summary</parameter> <type>boolean</type> )
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type> <optional>,<parameter>summary</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> )
<returnvalue>setof record</returnvalue>
( <parameter>name</parameter> <type>text</type>,
<parameter>ident</parameter> <type>text</type>,
@@ -360,7 +360,7 @@
Statistics for contexts on level 2 and below are aggregates of all
child contexts' statistics, where <literal>num_agg_contexts</literal>
indicate the number aggregated child contexts. When
- <parameter>summary</parameter> is <literal>false</literal>,
+ <parameter>summary</parameter> is <literal>false</literal> (the default),
<literal>the num_agg_contexts</literal> value is <literal>1</literal>,
indicating that individual statistics are being displayed.
</para>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 2d946d6d9e9..7b40bac5f57 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -657,6 +657,17 @@ LANGUAGE INTERNAL
STRICT VOLATILE PARALLEL UNSAFE
AS 'pg_replication_origin_session_setup';
+CREATE OR REPLACE FUNCTION
+ pg_get_process_memory_contexts(IN pid integer, IN summary boolean DEFAULT false,
+ OUT name text, OUT ident text, OUT type text, OUT level integer,
+ OUT path integer[], OUT total_bytes bigint, OUT total_nblocks bigint,
+ OUT free_bytes bigint, OUT free_chunks bigint, OUT used_bytes bigint,
+ OUT num_agg_contexts integer)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+STRICT VOLATILE PARALLEL UNSAFE
+AS 'pg_get_process_memory_contexts';
+
--
-- The default permissions for functions mean that anyone can execute them.
-- A number of functions shouldn't be executable by just anyone, but rather
@@ -782,6 +793,7 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
--
-- We also set up some things as accessible to standard roles.
--
@@ -808,6 +820,8 @@ GRANT EXECUTE ON FUNCTION pg_current_logfile() TO pg_monitor;
GRANT EXECUTE ON FUNCTION pg_current_logfile(text) TO pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
GRANT pg_read_all_settings TO pg_monitor;
GRANT pg_read_all_stats TO pg_monitor;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 17ec512622b..086c4c8fb6f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -692,11 +692,6 @@ GRANT SELECT ON pg_backend_memory_contexts TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_backend_memory_contexts() TO pg_read_all_stats;
-REVOKE EXECUTE ON FUNCTION
- pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
-GRANT EXECUTE ON FUNCTION
- pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
-
-- Statistics views
CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index c661eef7ae9..f1b5c3a0887 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -49,8 +49,11 @@
*/
#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
-/* Size of dshash key */
-#define CLIENT_KEY_SIZE 32
+/*
+ * Size of dshash key. The key is a uint32 rendered as a string, 10 chars
+ * plus space for a NULL terminator can hold all the values.
+ */
+#define CLIENT_KEY_SIZE (10 + 1)
/* Dynamic shared memory state for reporting statistics per context */
typedef struct MemoryStatsEntry
@@ -92,8 +95,7 @@ static const dshash_parameters memctx_dsh_params = {
};
/*
- * These are used for reporting memory context
- * statistics of a process.
+ * These are used for reporting memory context statistics of a process.
*/
/* Lock to control access to client_keys array */
@@ -103,10 +105,9 @@ static LWLock *client_keys_lock = NULL;
static int *client_keys = NULL;
/*
- * Table to store pointers to dsa memory containing
- * memory statistics and other meta data. There is one
- * entry per client backend request, keyed by ProcNumber of
- * the client obtained from client_keys array above.
+ * Table to store pointers to DSA memory containing memory statistics and other
+ * metadata. There is one entry per client backend request, keyed by ProcNumber
+ * of the client obtained from client_keys array above.
*/
static dshash_table *MemoryStatsDsHash = NULL;
@@ -125,8 +126,6 @@ static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
MemoryContextCounters stat,
int num_contexts);
static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
-static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryContext oldcontext,
- HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
@@ -140,15 +139,14 @@ static void end_memorycontext_reporting(MemoryStatsDSHashEntry *entry, MemoryCon
#define MEMORY_STATS_MAX_TIMEOUT 5
/*
- * MemoryStatsContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts and the likes.
+ * MemoryContextId
+ * Used for storage of transient identifiers for memory context reporting
*/
-typedef struct MemoryStatsContextId
+typedef struct MemoryContextId
{
MemoryContext context;
int context_id;
-} MemoryStatsContextId;
+} MemoryContextId;
/*
* int_list_to_array
@@ -199,7 +197,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
*/
for (MemoryContext cur = context; cur != NULL; cur = cur->parent)
{
- MemoryStatsContextId *entry;
+ MemoryContextId *entry;
bool found;
entry = hash_search(context_id_lookup, &cur, HASH_FIND, &found);
@@ -314,7 +312,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
HTAB *context_id_lookup;
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.entrysize = sizeof(MemoryContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_backend_memory_contexts",
@@ -341,7 +339,7 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
foreach_ptr(MemoryContextData, cur, contexts)
{
- MemoryStatsContextId *entry;
+ MemoryContextId *entry;
bool found;
/*
@@ -349,8 +347,8 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
* PutMemoryContextsStatsTupleStore needs this to populate the "path"
* column with the parent context_ids.
*/
- entry = (MemoryStatsContextId *) hash_search(context_id_lookup, &cur,
- HASH_ENTER, &found);
+ entry = (MemoryContextId *) hash_search(context_id_lookup, &cur,
+ HASH_ENTER, &found);
entry->context_id = context_id++;
Assert(!found);
@@ -437,10 +435,8 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* wait for the results and display them.
*
* By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
- * to signal a process to return the memory contexts. This is because allowing
- * any users to issue this request at an unbounded rate would cause lots of
- * requests to be sent, which can lead to denial of service. Additional roles
- * can be permitted with GRANT.
+ * to signal a process to return the memory contexts. Additional roles can be
+ * permitted with GRANT.
*
* On receipt of this signal, a backend or an auxiliary process sets the flag
* in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
@@ -453,14 +449,14 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
* at the end of the buffer.
*
* After sending the signal, wait on a condition variable. The publishing
- * backend, after copying the data to shared memory, sends signal on that
+ * backend, after copying the data to shared memory, sends a signal on that
* condition variable. There is one condition variable per client process.
* Once the condition variable is signalled, check if the latest memory context
* information is available and display.
*
* If the publishing backend does not respond before the condition variable
- * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give up
- * and return NULL.
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give
+ * up and return NULL.
*/
Datum
pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
@@ -495,12 +491,8 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
*/
if (proc == NULL)
{
- /*
- * This is a warning because we don't want to break loops.
- */
ereport(WARNING,
- errmsg("PID %d is not a PostgreSQL server process",
- pid));
+ errmsg("PID %d is not a PostgreSQL server process", pid));
PG_RETURN_NULL();
}
@@ -508,7 +500,7 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
/*
* Create a DSA to allocate memory for copying memory contexts statistics.
- * Allocate the memory in the DSA and send dsa pointer to the server
+ * Allocate the memory in the DSA and send DSA pointer to the server
* process for storing the context statistics. If number of contexts
* exceed a predefined limit (1MB), a cumulative total is stored for such
* contexts.
@@ -521,8 +513,8 @@ pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
&found);
/*
- * The dsa pointers containing statistics for each client are stored in a
- * dshash table. In addition to dsa pointer, each entry in this table also
+ * The DSA pointers containing statistics for each client are stored in a
+ * dshash table. In addition to DSA pointer, each entry in this table also
* contains information about the statistics, condition variable for
* signalling between client and the server and miscellaneous data
* specific to a request. There is one entry per client request in the
@@ -838,9 +830,9 @@ HandleGetMemoryContextInterrupt(void)
* output.
*
* Statistics for all the processes are shared via the same dynamic shared
- * area. Individual statistics are tracked independently in
- * per-process DSA pointers. These pointers are stored in a dshash table with
- * key as requesting clients ProcNumber.
+ * area. Individual statistics are tracked independently in per-process DSA
+ * pointers. These pointers are stored in a dshash table with key as requesting
+ * clients ProcNumber.
*
* We calculate maximum number of context's statistics that can be displayed
* using a pre-determined limit for memory available per process for this
@@ -848,11 +840,11 @@ HandleGetMemoryContextInterrupt(void)
* context statistics if any are captured as a cumulative total at the end of
* individual context's statistics.
*
- * If summary is true, we capture the level 1 and level 2 contexts
- * statistics. For that we traverse the memory context tree recursively in
- * depth first search manner to cover all the children of a parent context, to
- * be able to display a cumulative total of memory consumption by a parent at
- * level 2 and all its children.
+ * If summary is true, we capture the level 1 and level 2 contexts statistics.
+ * For that we traverse the memory context tree recursively in depth first
+ * search manner to cover all the children of a parent context, to be able to
+ * display a cumulative total of memory consumption by a parent at level 2 and
+ * all its children.
*/
void
ProcessGetMemoryContextInterrupt(void)
@@ -899,7 +891,7 @@ ProcessGetMemoryContextInterrupt(void)
* similar to its local backend counterpart.
*/
ctl.keysize = sizeof(MemoryContext);
- ctl.entrysize = sizeof(MemoryStatsContextId);
+ ctl.entrysize = sizeof(MemoryContextId);
ctl.hcxt = CurrentMemoryContext;
context_id_lookup = hash_create("pg_get_remote_backend_memory_contexts",
@@ -912,7 +904,7 @@ ProcessGetMemoryContextInterrupt(void)
/*
* If DSA exists, created by another process requesting statistics, attach
- * to it. We expect the client process to create required DSA and Dshash
+ * to it. We expect the client process to create required DSA and DSHash
* table.
*/
if (MemoryStatsDsaArea == NULL)
@@ -923,7 +915,6 @@ ProcessGetMemoryContextInterrupt(void)
MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
&memctx_dsh_params, &found);
-
snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
/*
@@ -935,25 +926,24 @@ ProcessGetMemoryContextInterrupt(void)
entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
/*
- * Entry has been deleted due to client process exit. Make sure that the
- * client always deletes the entry after taking required lock or this
- * function may end up writing to unallocated memory.
+ * Check if the entry has been deleted due to calling process exiting, or
+ * if the caller has timed out waiting for us and have issued a request to
+ * another backend.
+ *
+ * XXX ?: Make sure that the client always deletes the entry after taking
+ * required lock or this function may end up writing to unallocated
+ * memory.
*/
- if (!found)
+ if (!found || entry->target_server_id != MyProcPid)
{
entry->stats_written = false;
- end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
- return;
- }
- /*
- * The client has timed out waiting for us to write statistics and is
- * requesting statistics from some other process
- */
- if (entry->target_server_id != MyProcPid)
- {
- entry->stats_written = false;
- end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+
return;
}
@@ -966,7 +956,7 @@ ProcessGetMemoryContextInterrupt(void)
{
int cxt_id = 0;
List *path = NIL;
- MemoryStatsContextId *contextid_entry;
+ MemoryContextId *contextid_entry;
/* Copy TopMemoryContext statistics to DSA */
memset(&stat, 0, sizeof(stat));
@@ -976,9 +966,9 @@ ProcessGetMemoryContextInterrupt(void)
PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
1);
- contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
- &TopMemoryContext,
- HASH_ENTER, &found);
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &TopMemoryContext,
+ HASH_ENTER, &found);
Assert(!found);
/*
@@ -1001,8 +991,8 @@ ProcessGetMemoryContextInterrupt(void)
memset(&grand_totals, 0, sizeof(grand_totals));
cxt_id++;
- contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
- &c, HASH_ENTER, &found);
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &c, HASH_ENTER, &found);
Assert(!found);
contextid_entry->context_id = cxt_id + 1;
@@ -1014,119 +1004,111 @@ ProcessGetMemoryContextInterrupt(void)
grand_totals, num_contexts);
}
entry->total_stats = cxt_id + 1;
-
- entry->stats_written = true;
- end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
- /* Notify waiting client backend and return */
- ConditionVariableSignal(&entry->memcxt_cv);
- return;
}
- foreach_ptr(MemoryContextData, cur, contexts)
+ else
{
- List *path = NIL;
- MemoryStatsContextId *contextid_entry;
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
- contextid_entry = (MemoryStatsContextId *) hash_search(context_id_lookup,
- &cur,
- HASH_ENTER, &found);
- Assert(!found);
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
- /*
- * context id starts with 1
- */
- contextid_entry->context_id = context_id + 1;
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id
+ * is greater than max_stats, we stop reporting individual
+ * statistics when context_id equals max_stats - 2. As we use
+ * max_stats - 1 array slot for reporting cumulative statistics or
+ * "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
/*
- * Figure out the transient context_id of this context and each of its
- * ancestors, to compute a path for this context.
+ * Check if there are aggregated statistics or not in the result set.
+ * Statistics are individually reported when context_id <= max_stats,
+ * only if context_id > max_stats will there be aggregates.
*/
- path = compute_context_path(cur, context_id_lookup);
-
- /* Examine the context stats */
- memset(&stat, 0, sizeof(stat));
- (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
-
- /* Account for saving one statistics slot for cumulative reporting */
- if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
{
- /* Copy statistics to DSA memory */
- PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
- }
- else
- {
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
}
/*
- * DSA max limit per process is reached, write aggregate of the
- * remaining statistics.
- *
- * We can store contexts from 0 to max_stats - 1. When context_id is
- * greater than max_stats, we stop reporting individual statistics
- * when context_id equals max_stats - 2. As we use max_stats - 1 array
- * slot for reporting cumulative statistics or "Remaining Totals".
+ * The number of contexts exceeded the space available, so report the
+ * number of aggregated memory contexts
*/
- if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ else
{
- int namelen = strlen("Remaining Totals");
-
- num_individual_stats = context_id + 1;
- strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
- "Remaining Totals", namelen + 1);
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats =
+ context_id - num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for
+ * cumulative statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
}
- context_id++;
-
- for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
- contexts = lappend(contexts, c);
}
- /*
- * Statistics are not aggregated, i.e individual statistics reported when
- * context_id <= max_stats.
- */
- if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
- {
- entry->total_stats = context_id;
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
- }
- /* Report number of aggregated memory contexts */
- else
- {
- meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = context_id
- - num_individual_stats;
-
- /*
- * Total stats equals num_individual_stats + 1 record for cumulative
- * statistics.
- */
- entry->total_stats = num_individual_stats + 1;
- }
entry->stats_written = true;
- end_memorycontext_reporting(entry, oldcontext, context_id_lookup);
- /* Notify waiting client backend and return */
- ConditionVariableSignal(&entry->memcxt_cv);
-}
-
-/*
- * Clean up before exit from ProcessGetMemoryContextInterrupt
- */
-static void
-end_memorycontext_reporting(MemoryStatsDSHashEntry *entry,
- MemoryContext oldcontext, HTAB *context_id_lookup)
-{
- MemoryContext curr_ctx = CurrentMemoryContext;
-
dshash_release_lock(MemoryStatsDsHash, entry);
-
hash_destroy(context_id_lookup);
+
MemoryContextSwitchTo(oldcontext);
- MemoryContextReset(curr_ctx);
+ MemoryContextReset(memstats_ctx);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
}
/*
@@ -1144,7 +1126,7 @@ compute_context_path(MemoryContext c, HTAB *context_id_lookup)
for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
{
- MemoryStatsContextId *cur_entry;
+ MemoryContextId *cur_entry;
cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
@@ -1167,8 +1149,8 @@ PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
MemoryContext context, List *path,
MemoryContextCounters stat, int num_contexts)
{
- const char *ident = context->ident;
- const char *name = context->name;
+ char *ident = unconstify(char *, context->ident);
+ char *name = unconstify(char *, context->name);
/*
* To be consistent with logging output, we label dynahash contexts with
@@ -1176,7 +1158,7 @@ PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
*/
if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
{
- name = context->ident;
+ name = unconstify(char *, context->ident);
ident = NULL;
}
@@ -1184,7 +1166,7 @@ PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
{
int namelen = strlen(name);
- if (strlen(name) >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ if (namelen >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
namelen = pg_mbcliplen(name, namelen,
MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
@@ -1212,7 +1194,7 @@ PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
else
memcxt_info[curr_id].ident[0] = '\0';
- /* Allocate DSA memory for storing path information */
+ /* Store the path */
if (path == NIL)
memcxt_info[curr_id].path[0] = 0;
else
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 4296667cbf0..617de0ebf91 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -18,7 +18,6 @@
#define MEMUTILS_H
#include "nodes/memnodes.h"
-#include "utils/dsa.h"
/*
* MaxAllocSize, MaxAllocHugeSize
@@ -48,6 +47,7 @@
#define AllocHugeSizeIsValid(size) ((Size) (size) <= MaxAllocHugeSize)
+
/*
* Standard top-level memory contexts.
*
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
index 69d8489eb37..8fa12d1f693 100644
--- a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -11,7 +11,7 @@ use Test::More;
if ($ENV{enable_injection_points} ne 'yes')
{
- plan skip_all => 'Injection points not supported by this build';
+ plan skip_all => 'Injection points not supported by this build';
}
my $psql_err;
# Create and start a cluster with one node
@@ -21,8 +21,8 @@ $node->init(allows_streaming => 1);
# and log_statement is dialled down since it otherwise will generate enormous
# amounts of logging. Page verification failures are still logged.
$node->append_conf(
- 'postgresql.conf',
- qq[
+ 'postgresql.conf',
+ qq[
max_connections = 100
log_statement = none
]);
@@ -30,29 +30,45 @@ $node->start;
$node->safe_psql('postgres', 'CREATE EXTENSION test_memcontext_reporting;');
$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
# Attaching to a client process injection point that throws an error
-$node->safe_psql('postgres', "select injection_points_attach('memcontext-client-crash', 'error');");
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-crash', 'error');");
-my $pid = $node->safe_psql('postgres', "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+my $pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
print "PID";
print $pid;
#Client should have thrown error
-$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
-like ( $psql_err, qr/error triggered for injection point memcontext-client-crash/);
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/error triggered for injection point memcontext-client-crash/);
#Query the same process after detaching the injection point, using some other client and it should succeed.
-$node->safe_psql('postgres', "select injection_points_detach('memcontext-client-crash');");
-my $topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-crash');");
+my $topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
ok($topcontext_name = 'TopMemoryContext');
# Attaching to a target process injection point that throws an error
-$node->safe_psql('postgres', "select injection_points_attach('memcontext-server-crash', 'error');");
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-server-crash', 'error');");
#Server should have thrown error
-$node->psql('postgres', qq(select pg_get_process_memory_contexts($pid, true);), stderr => \$psql_err);
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
#Query the same process after detaching the injection point, using some other client and it should succeed.
-$node->safe_psql('postgres', "select injection_points_detach('memcontext-server-crash');");
-$topcontext_name = $node->safe_psql('postgres', "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-crash');");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
ok($topcontext_name = 'TopMemoryContext');
done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa898f025c0..5e3122a468b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1686,9 +1686,9 @@ MemoryContextCallback
MemoryContextCallbackFunction
MemoryContextCounters
MemoryContextData
+MemoryContextId
MemoryContextMethodID
MemoryContextMethods
-MemoryStatsContextId
MemoryStatsEntry
MemoryStatsDSHashEntry
MemoryStatsPrintFunc
--
2.39.3 (Apple Git-146)
Hi,
Sorry for the late response. Thank you for your reviewing and testing the
patch.
On Mon, Dec 8, 2025 at 6:56 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:
On 2025-11-28 18:22, Rahila Syed wrote:
Hi,
I'm attaching the updated patches, which primarily include cleanup and
have been rebased
following the CFbot report.Thanks for updating the patch!
I observed an assertion failure when forcing a timeout as follows:
Good catch. This assertion is no longer valid because of recent updates
that reset the client_keys slot for a request when the client exits with a
timeout.
To address this, I’ve replaced the assertion with a check for -1 and now
return
from the function in that case.
It might be good to also document in func-admin.sgml that the function
times out after 5 seconds when the target backend does not respond, and
that in such a case NULLs are returned.
Added this.
From the comment, it sounded to me as if the client executing
pg_get_process_memory_contexts() might not create the DSA in some cases.
Is it correct to assume that such a situation can happen?
In [1], as a response to concerns about using DSA inside a CFI handler,
you wrote that “all the dynamic shared memory needed to store the
statistics is created and deleted in the client function”.
So I understood that it would never create the DSA inside the CFI
handler.
If that understanding is correct, perhaps the comment should be reworded
to make that clear.
Yes, your understanding is correct. I reworded the comment accordingly.
+ context_id_lookup =
hash_create("pg_get_remote_backend_memory_contexts",This appears to use the old function name. Should this be updated to
"pg_get_process_memory_contexts" instead?
Modified this.
I will post the updated patch in response to Daniel's message that follows
your email.
Thank you,
Rahila Syed
Hi Daniel,
Thanks for the patch, below are a few comments and suggestions. As I was
reviewing I tweaked the below and have attached the comments as changes in
0003.
Thank you for the improvements.
All your changes look good to me. I have incorporated those in the v44
patch.
+ * Entry has been deleted due to client process exit. Make sure that the + * client always deletes the entry after taking required lock or this + * function may end up writing to unallocated memory. Can you explain this a bit further, I'm not sure I get it. The code goes on to release a lock immediately and then destroys the hash. Who is responsible for destroying the entry?
This just points to the general requirements of taking a lock before
writing to a shared variable.
This serves as a warning to other processes not to delete the entry without
taking a lock, since
we are about to write to the entry.
== In ProcessGetMemoryContextInterrupt()
I'm not a fan of having to exit's from the function doing duplicative
cleanups,
in the attached I've wrapped them in a conditional to just have one exit
path.
What do you think about that?
I agree with your approach. It certainly makes the code more concise and
easier to read.
== In PublishMemoryContext()
+ /* Allocate DSA memory for storing path information */
This comment is no longer accurate is it? The DSA has already been
allocated
at this point.
Yes, it is not valid anymore. Fixed accordingly.
Apart from this, I cleaned up the test module by removing unnecessary sql
functions, added some more injection points based tests and a few
minor tweaks.
Please find attached updated and rebased patches.
Thank you,
Rahila Syed
Attachments:
v44-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v44-0001-Add-function-to-report-memory-context-statistics.patchDownload
From 4b20c9f6da3c9d92d3af85ddbb761843eb3d04b1 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 27 Nov 2025 14:39:43 +0530
Subject: [PATCH 1/2] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 159 +++
src/backend/catalog/system_functions.sql | 14 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 1005 ++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 31 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 8 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 2 +
25 files changed, 1292 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 1b465bc8ba7..3e849a132e9 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,132 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type> <optional>,<parameter>summary</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ If the process does not respond with memory contexts statistics in 5 seconds,
+ function returns NULL.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal> (the default),
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +428,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 2d946d6d9e9..7b40bac5f57 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -657,6 +657,17 @@ LANGUAGE INTERNAL
STRICT VOLATILE PARALLEL UNSAFE
AS 'pg_replication_origin_session_setup';
+CREATE OR REPLACE FUNCTION
+ pg_get_process_memory_contexts(IN pid integer, IN summary boolean DEFAULT false,
+ OUT name text, OUT ident text, OUT type text, OUT level integer,
+ OUT path integer[], OUT total_bytes bigint, OUT total_nblocks bigint,
+ OUT free_bytes bigint, OUT free_chunks bigint, OUT used_bytes bigint,
+ OUT num_agg_contexts integer)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+STRICT VOLATILE PARALLEL UNSAFE
+AS 'pg_get_process_memory_contexts';
+
--
-- The default permissions for functions mean that anyone can execute them.
-- A number of functions shouldn't be executable by just anyone, but rather
@@ -782,6 +793,7 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
--
-- We also set up some things as accessible to standard roles.
--
@@ -808,6 +820,8 @@ GRANT EXECUTE ON FUNCTION pg_current_logfile() TO pg_monitor;
GRANT EXECUTE ON FUNCTION pg_current_logfile(text) TO pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
GRANT pg_read_all_settings TO pg_monitor;
GRANT pg_read_all_stats TO pg_monitor;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..baba657904b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 7f8cf1fa2ec..749e68553b9 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3a65d841725..b89617d78db 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index e7e4d652f97..eb86648f7b7 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b23d0c19360..a5ed58a18c5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -140,6 +141,7 @@ CalculateShmemSize(void)
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -328,6 +330,7 @@ CreateOrAttachShmemStructs(void)
InjectionPointShmemInit();
AioShmemInit();
WaitLSNShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ebc3f4ca457..27b3b51cf2d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..e726f40dfbb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f39830dbb34..3889228b1ed 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -162,6 +162,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -405,6 +406,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 46dfb3dd133..c3ec83bd1ed 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,22 +17,130 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum number of memory context statistics is calculated by dividing
+ * max memory allocated per backend with maximum size per context statistics.
+ * The identifier and name are statically allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
+
+/*
+ * Size of dshash key. The key is a uint32 rendered as a string, 10 chars
+ * plus space for a NULL terminator can hold all the values.
+ */
+#define CLIENT_KEY_SIZE (10 + 1)
+
+/* Dynamic shared memory state for reporting statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * These are used for reporting memory context statistics of a process.
+ */
+
+/* Lock to control access to client_keys array */
+static LWLock *client_keys_lock = NULL;
+
+/* Array to store the keys of MemoryStatsDsHash */
+static int *client_keys = NULL;
+
+/*
+ * Table to store pointers to DSA memory containing memory statistics and other
+ * metadata. There is one entry per client backend request, keyed by ProcNumber
+ * of the client obtained from client_keys array above.
+ */
+static dshash_table *MemoryStatsDsHash = NULL;
+
+/*
+ * Dsa area which stores the actual memory context
+ * statistics.
+ */
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static void memstats_client_key_reset(int ProcNumber);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
* MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
+ * Used for storage of transient identifiers for memory context reporting
*/
typedef struct MemoryContextId
{
@@ -143,24 +251,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +266,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +428,845 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. Additional roles can be
+ * permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends a signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give
+ * up and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Check if the server process slot is not empty and exit early Non-empty
+ * slot means some other client backend is requesting the statistics from
+ * the same server process.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] != -1)
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send DSA pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ /*
+ * The DSA pointers containing statistics for each client are stored in a
+ * dshash table. In addition to DSA pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m",
+ pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ INJECTION_POINT("memcontext-client-injection", NULL);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum,
+ path_length,
+ INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+
+/*
+ * Remove this process from the publishing process'
+ * client key slot, if the stats publishing process has failed to do so.
+ */
+static void
+memstats_client_key_reset(int procNumber)
+{
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == MyProcNumber)
+ client_keys[procNumber] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in per-process DSA
+ * pointers. These pointers are stored in a dshash table with key as requesting
+ * clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts statistics.
+ * For that we traverse the memory context tree recursively in depth first
+ * search manner to cover all the children of a parent context, to be able to
+ * display a cumulative total of memory consumption by a parent at level 2 and
+ * all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ INJECTION_POINT("memcontext-server-wait", NULL);
+
+ /*
+ * Retrieve the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process.
+ * Return if the client_key is -1, which means the requesting client has
+ * timed out.
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ if (client_keys[MyProcNumber] == -1)
+ {
+ LWLockRelease(client_keys_lock);
+ return;
+ }
+ else
+ {
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+ }
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL,
+ "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_process_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * The client process should have created the required DSA and DSHash
+ * table. Here we just attach to those.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-injection", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Check if the entry has been deleted due to calling process exiting, or
+ * if the caller has timed out waiting for us and have issued a request to
+ * another backend.
+ *
+ * Make sure that the client always deletes the entry after taking
+ * required lock or this function may end up writing to unallocated
+ * memory.
+ */
+ if (!found || entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+
+ return;
+ }
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (entry->summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &c, HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+ }
+ else
+ {
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id
+ * is greater than max_stats, we stop reporting individual
+ * statistics when context_id equals max_stats - 2. As we use
+ * max_stats - 1 array slot for reporting cumulative statistics or
+ * "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Check if there are aggregated statistics or not in the result set.
+ * Statistics are individually reported when context_id <= max_stats,
+ * only if context_id > max_stats will there be aggregates.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+
+ /*
+ * The number of contexts exceeded the space available, so report the
+ * number of aggregated memory contexts
+ */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats =
+ context_id - num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for
+ * cumulative statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ }
+
+ entry->stats_written = true;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ hash_destroy(context_id_lookup);
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ char *ident = unconstify(char *, context->ident);
+ char *name = unconstify(char *, context->name);
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = unconstify(char *, context->ident);
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (namelen >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Store the path */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4ed69ac7ba2..c5a36dcbc95 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -658,6 +658,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..31c4de9f0b4 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1008,6 +1008,37 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fd9448ec7b9..bef24d625d9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..b76f24baed6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 5b0ce383408..613e769c84e 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -136,3 +136,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..617de0ebf91 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -19,7 +19,6 @@
#include "nodes/memnodes.h"
-
/*
* MaxAllocSize, MaxAllocHugeSize
* Quasi-arbitrary limits on size of allocations.
@@ -319,4 +318,11 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..3799ef7c862 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9dd65b10254..eb25f426cf2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1690,6 +1690,8 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v44-0002-Test-module-to-test-memory-context-reporting-wit.patchapplication/octet-stream; name=v44-0002-Test-module-to-test-memory-context-reporting-wit.patchDownload
From b2315034628f0ae6572f9a4887a54f14974105c2 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Fri, 28 Nov 2025 14:46:38 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 29 ++++
.../t/001_memcontext_inj.pl | 150 ++++++++++++++++++
.../test_memcontext_reporting.c | 12 ++
.../test_memcontext_reporting.control | 4 +
5 files changed, 196 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4c6d56d97d8..1156d731014 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..0a2dfc44f1c
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,29 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..b491d6ebc0a
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+my $psql_out;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+restart_after_crash = false
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+# Attaching to a client process's injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'error');"
+);
+
+my $pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+
+#Client should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/error triggered for injection point memcontext-client-injection/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+my $topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-server-injection', 'error');"
+);
+
+#Server should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-injection');");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test that two concurrent requests to the same process results in a warning for
+# one of those
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-client-injection', 'wait');");
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+my $psql_session1 = $node->background_psql('postgres');
+$psql_session1->query_until(
+ qr//,
+ qq(
+ SELECT pg_get_process_memory_contexts($pid, true);
+));
+$node->wait_for_event('client backend', 'memcontext-client-injection');
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+ok($psql_err =~
+ /WARNING: server process $pid is processing previous request/);
+#Wake the client up.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-client-injection');");
+
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+
+# Test the client process exiting with timeout does not break the server process
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+# Following client query times out, returning NULL as output
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stdout => \$psql_out);
+ok($psql_out = 'NULL');
+#Wakeup the server process up and detach the injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-server-wait');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+#Query the same server process again and it should succeed.
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test if the monitoring works fine, when the client backend crashes.
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'test_memcontext_reporting', 'crash', NULL);"
+);
+
+#Client will crash
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/WARNING: terminating connection because of crash of another server process|server closed the connection unexpectedly|connection to server was lost|could not send data to server/
+);
+
+# Wait till server restarts
+$node->restart;
+$node->poll_query_until('postgres', "SELECT 1;", '1');
+
+#Querying memory stats should succeed after server start
+$pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..d641f3616dc
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,12 @@
+#include "postgres.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
Hi,
PFA the updated and rebased patches.
To summarize the work done in this thread,
here are some key concerns discussed in threads [1]. PostgreSQL: Re: pgsql: Add function to get memory context stats for processes </messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com> and [2]. PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler. </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se and steps taken
to address those:
*DSA APIs and CFI Handler Safety*: DSA APIs, being high-level, are unsafe
to call from the CFI handler,
which can be invoked from low-level code. This concern was particularly
raised for APIs like `dsa_allocate()`
and `dsa_create()`.
To resolve this, these APIs have been moved out of the CFI handler
function. Now, the dynamic shared memory
needed to store the statistics is allocated and deleted in the client
function. The only operation performed in the CFI
handler is `dsm_attach()`, which attaches to DSA for copying statistics.
Since dsm_attach() only maps the existing
DSM into the current process address space and does not create a new DSM, I
don't see any specific reason why
it would be unsafe to call it from the CFI handler.
*Memory Leak in TopMemoryContext*: A memory leak was reported in the
TopMemoryContext and there were
concerns that memory allocated while executing the memory statistics
reporting function could impact its output.
To address this, we create an exclusive memory context under the NULL
context to handle all memory allocations in
`ProcessGetMemoryContextInterrupt`. This context does not fall under the
TopMemoryContext tree, ensuring that
allocations do not affect the function's outcome. The memory context is
reset at the end of the function, preventing
leaks.
*Error Reported in Thread [2]. PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler. </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se*: This issue has been fixed by switching to a
NULL resource owner before attaching
to DSM in the CFI handler.
*Other Improvements*:
1. Simplified the user interface by removing the `timeout` argument and
using a constant value instead.
2. Provided a default for the `get_summary` argument so users do not need
to pass a value if they choose not to.
3. The dsm_registry APIs are used to create and attach to DSA and DSHASH
tables, which helps avoid code duplication.
4. Replaced the static shared memory array with a DSHASH table, which holds
metadata such as pointers to memory
containing statistics for each process.
5. The updates previously made to mcxt.c have been moved to mcxtfuncs.c,
which now includes all the existing memory
statistics functions as well as the code for the new proposed function
6. One function that relies on an unexported API is added in mcxt.c
Thank you,
Rahila Syed
[1]: . PostgreSQL: Re: pgsql: Add function to get memory context stats for processes </messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com>
processes
</messages/by-id/CA+Tgmoaey-kOP1k5FaUnQFd1fR0majVebWcL8ogfLbG_nt-Ytg@mail.gmail.com>
[2]: . PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler. </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se
an interrupt handler.
</messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se
Attachments:
v45-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v45-0001-Add-function-to-report-memory-context-statistics.patchDownload
From 024a1ce3cf4a75d025ecf6f906beac3b3d634bb2 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 27 Nov 2025 14:39:43 +0530
Subject: [PATCH 1/2] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 159 +++
src/backend/catalog/system_functions.sql | 14 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 1009 ++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 31 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 8 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 2 +
25 files changed, 1296 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 2896cd9e429..5eac0e5f73c 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,132 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type> <optional>,<parameter>summary</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ If the process does not respond with memory contexts statistics in 5 seconds,
+ function returns NULL.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal> (the default),
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +428,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 2d946d6d9e9..7b40bac5f57 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -657,6 +657,17 @@ LANGUAGE INTERNAL
STRICT VOLATILE PARALLEL UNSAFE
AS 'pg_replication_origin_session_setup';
+CREATE OR REPLACE FUNCTION
+ pg_get_process_memory_contexts(IN pid integer, IN summary boolean DEFAULT false,
+ OUT name text, OUT ident text, OUT type text, OUT level integer,
+ OUT path integer[], OUT total_bytes bigint, OUT total_nblocks bigint,
+ OUT free_bytes bigint, OUT free_chunks bigint, OUT used_bytes bigint,
+ OUT num_agg_contexts integer)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+STRICT VOLATILE PARALLEL UNSAFE
+AS 'pg_get_process_memory_contexts';
+
--
-- The default permissions for functions mean that anyone can execute them.
-- A number of functions shouldn't be executable by just anyone, but rather
@@ -782,6 +793,7 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
--
-- We also set up some things as accessible to standard roles.
--
@@ -808,6 +820,8 @@ GRANT EXECUTE ON FUNCTION pg_current_logfile() TO pg_monitor;
GRANT EXECUTE ON FUNCTION pg_current_logfile(text) TO pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
GRANT pg_read_all_settings TO pg_monitor;
GRANT pg_read_all_stats TO pg_monitor;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..baba657904b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 7f8cf1fa2ec..749e68553b9 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -679,6 +679,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3a65d841725..b89617d78db 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index e7e4d652f97..eb86648f7b7 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b23d0c19360..a5ed58a18c5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -140,6 +141,7 @@ CalculateShmemSize(void)
size = add_size(size, SlotSyncShmemSize());
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -328,6 +330,7 @@ CreateOrAttachShmemStructs(void)
InjectionPointShmemInit();
AioShmemInit();
WaitLSNShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 087821311cc..8963285cc12 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -691,6 +691,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ebc3f4ca457..27b3b51cf2d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..e726f40dfbb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c0632bf901a..bf75b891495 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -162,6 +162,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -405,6 +406,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 46dfb3dd133..babf8706513 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,22 +17,130 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum number of memory context statistics is calculated by dividing
+ * max memory allocated per backend with maximum size per context statistics.
+ * The identifier and name are statically allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
+
+/*
+ * Size of dshash key. The key is a uint32 rendered as a string, 10 chars
+ * plus space for a NULL terminator can hold all the values.
+ */
+#define CLIENT_KEY_SIZE (10 + 1)
+
+/* Dynamic shared memory state for reporting statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * These are used for reporting memory context statistics of a process.
+ */
+
+/* Lock to control access to client_keys array */
+static LWLock *client_keys_lock = NULL;
+
+/* Array to store the keys of MemoryStatsDsHash */
+static int *client_keys = NULL;
+
+/*
+ * Table to store pointers to DSA memory containing memory statistics and other
+ * metadata. There is one entry per client backend request, keyed by ProcNumber
+ * of the client obtained from client_keys array above.
+ */
+static dshash_table *MemoryStatsDsHash = NULL;
+
+/*
+ * Dsa area which stores the actual memory context
+ * statistics.
+ */
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static void memstats_client_key_reset(int ProcNumber);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
* MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
+ * Used for storage of transient identifiers for memory context reporting
*/
typedef struct MemoryContextId
{
@@ -143,24 +251,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +266,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +428,849 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. Additional roles can be
+ * permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends a signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give
+ * up and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Check if the server process slot is not empty and exit early Non-empty
+ * slot means some other client backend is requesting the statistics from
+ * the same server process.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] != -1)
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send DSA pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ /*
+ * The DSA pointers containing statistics for each client are stored in a
+ * dshash table. In addition to DSA pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m",
+ pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ INJECTION_POINT("memcontext-client-injection", NULL);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum,
+ path_length,
+ INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+
+/*
+ * Remove this process from the publishing process'
+ * client key slot, if the stats publishing process has failed to do so.
+ */
+static void
+memstats_client_key_reset(int procNumber)
+{
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == MyProcNumber)
+ client_keys[procNumber] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in per-process DSA
+ * pointers. These pointers are stored in a dshash table with key as requesting
+ * clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts statistics.
+ * For that we traverse the memory context tree recursively in depth first
+ * search manner to cover all the children of a parent context, to be able to
+ * display a cumulative total of memory consumption by a parent at level 2 and
+ * all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+ ResourceOwner currentOwner;
+
+ PublishMemoryContextPending = false;
+
+ INJECTION_POINT("memcontext-server-wait", NULL);
+
+ /*
+ * Retrieve the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process.
+ * Return if the client_key is -1, which means the requesting client has
+ * timed out.
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ if (client_keys[MyProcNumber] == -1)
+ {
+ LWLockRelease(client_keys_lock);
+ return;
+ }
+ else
+ {
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+ }
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL,
+ "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_process_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ currentOwner = CurrentResourceOwner;
+ CurrentResourceOwner = NULL;
+ /*
+ * The client process should have created the required DSA and DSHash
+ * table. Here we just attach to those.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+ CurrentResourceOwner = currentOwner;
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-injection", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Check if the entry has been deleted due to calling process exiting, or
+ * if the caller has timed out waiting for us and have issued a request to
+ * another backend.
+ *
+ * Make sure that the client always deletes the entry after taking
+ * required lock or this function may end up writing to unallocated
+ * memory.
+ */
+ if (!found || entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+
+ return;
+ }
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (entry->summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &c, HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+ }
+ else
+ {
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id
+ * is greater than max_stats, we stop reporting individual
+ * statistics when context_id equals max_stats - 2. As we use
+ * max_stats - 1 array slot for reporting cumulative statistics or
+ * "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Check if there are aggregated statistics or not in the result set.
+ * Statistics are individually reported when context_id <= max_stats,
+ * only if context_id > max_stats will there be aggregates.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+
+ /*
+ * The number of contexts exceeded the space available, so report the
+ * number of aggregated memory contexts
+ */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats =
+ context_id - num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for
+ * cumulative statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ }
+
+ entry->stats_written = true;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ hash_destroy(context_id_lookup);
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ char *ident = unconstify(char *, context->ident);
+ char *name = unconstify(char *, context->name);
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = unconstify(char *, context->ident);
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (namelen >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Store the path */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4ed69ac7ba2..c5a36dcbc95 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -658,6 +658,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 47fd774c7d2..31c4de9f0b4 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1008,6 +1008,37 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fd9448ec7b9..bef24d625d9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..b76f24baed6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 5b0ce383408..613e769c84e 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -136,3 +136,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index afeeb1ca019..345d5a0ecb1 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..617de0ebf91 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -19,7 +19,6 @@
#include "nodes/memnodes.h"
-
/*
* MaxAllocSize, MaxAllocHugeSize
* Quasi-arbitrary limits on size of allocations.
@@ -319,4 +318,11 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..3799ef7c862 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..6b964360718 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,8 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
v45-0002-Test-module-to-test-memory-context-reporting-wit.patchapplication/octet-stream; name=v45-0002-Test-module-to-test-memory-context-reporting-wit.patchDownload
From e9c37a3f67f00821b998aec8443797c710703474 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Fri, 28 Nov 2025 14:46:38 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 29 ++++
.../t/001_memcontext_inj.pl | 150 ++++++++++++++++++
.../test_memcontext_reporting.c | 12 ++
.../test_memcontext_reporting.control | 4 +
5 files changed, 196 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4c6d56d97d8..1156d731014 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..0a2dfc44f1c
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,29 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..b491d6ebc0a
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+my $psql_out;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+restart_after_crash = false
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+# Attaching to a client process's injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'error');"
+);
+
+my $pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+
+#Client should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/error triggered for injection point memcontext-client-injection/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+my $topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-server-injection', 'error');"
+);
+
+#Server should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-injection');");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test that two concurrent requests to the same process results in a warning for
+# one of those
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-client-injection', 'wait');");
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+my $psql_session1 = $node->background_psql('postgres');
+$psql_session1->query_until(
+ qr//,
+ qq(
+ SELECT pg_get_process_memory_contexts($pid, true);
+));
+$node->wait_for_event('client backend', 'memcontext-client-injection');
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+ok($psql_err =~
+ /WARNING: server process $pid is processing previous request/);
+#Wake the client up.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-client-injection');");
+
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+
+# Test the client process exiting with timeout does not break the server process
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+# Following client query times out, returning NULL as output
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stdout => \$psql_out);
+ok($psql_out = 'NULL');
+#Wakeup the server process up and detach the injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-server-wait');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+#Query the same server process again and it should succeed.
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test if the monitoring works fine, when the client backend crashes.
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'test_memcontext_reporting', 'crash', NULL);"
+);
+
+#Client will crash
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/WARNING: terminating connection because of crash of another server process|server closed the connection unexpectedly|connection to server was lost|could not send data to server/
+);
+
+# Wait till server restarts
+$node->restart;
+$node->poll_query_until('postgres', "SELECT 1;", '1');
+
+#Querying memory stats should succeed after server start
+$pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..d641f3616dc
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,12 @@
+#include "postgres.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
Hi,
I've included some additional description inline and attached rebased
patches after
CFbot reported a conflict.
*DSA APIs and CFI Handler Safety*: DSA APIs, being high-level, are unsafe
to call from the CFI handler,
which can be invoked from low-level code. This concern was particularly
raised for APIs like `dsa_allocate()`
and `dsa_create()`.
To resolve this, these APIs have been moved out of the CFI handler
function. Now, the dynamic shared memory
needed to store the statistics is allocated and deleted in the client
function. The only operation performed in the CFI
handler is `dsm_attach()`, which attaches to DSA for copying statistics.
Since dsm_attach() only maps the existing
DSM into the current process address space and does not create a new DSM,
I don't see any specific reason why
it would be unsafe to call it from the CFI handler.
Following are the details about the use of DSM in the patch:
- DSA Creation: A Dynamic Shared Area (DSA) is used to store memory context
statistics.
- Client Process: When fetching memory context statistics, the client
allocates a 1 MB chunk
in the DSA, reads the statistics from the memory chunk, copies it into a
tuple store, and then
frees the chunk.
- Storage: Pointers to these chunks are stored in a DSHASH table indexed by
the client’s
proc number. Each entry in the DSHASH table also stores additional
metadata related to the
client’s request.
- Attachment: Backends only attach to the DSM segments for the DSA and
DSHASH table
when necessary i.e when a process queries memory context statistics or is
queried by
another backend.
Once attached, they remain so until the session ends, at which point they
remove their DSHASH
entry if any and detach from DSA and DSHASH segments.
- Lifecycle: The DSA and DSHASH structures are created upon the first SQL
function invocation
and destroyed on server restart.
*Error Reported in Thread [2]*: This issue has been fixed by switching to
a NULL resource owner before attaching
to DSM in the CFI handler.
This error mentioned in thread [2]. PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se is triggered during CFI() call from
secure_read() when a
backend is waiting for commands and it has an open transaction which is
going to abort
Below are some details about this fix.
It is safe to temporarily set the resource owner to NULL before attaching
to the DSA
and DSHASH, since these segments are intended to be attached for the full
session
and are detached only when the session ends.
We also restore the original resource owner immediately after the attach
completes.
Other possible fixes include:
1.Adjusting resource‑owner behavior
Either allow resource‑owner enlargement during release, or delay marking it
as releasing until
the abort actually begins.
2. Updating DSM registry APIs (e.g., GetNamedDSA)
Detect when the current resource owner is in a releasing state and
temporarily set
CurrentResourceOwner to NULL before calling dsa_attach.
3. Handling it in the DSA layer
This was discussed in thread [2]. PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se, but concerns were raised that DSA should
not compensate
for incorrect caller state; the caller must ensure the resource owner is
valid.
Kindly let me know your views.
Thank you,
Rahila Syed
[2]: . PostgreSQL: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler </messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se
an interrupt handler
</messages/by-id/8B873D49-E0E5-4F9F-B8D6-CA4836B825CD@yesql.se
Attachments:
v46-0002-Test-module-to-test-memory-context-reporting-wit.patchapplication/octet-stream; name=v46-0002-Test-module-to-test-memory-context-reporting-wit.patchDownload
From 6e1e38079e24d1d2fb96f013ee1ca04424564a18 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Fri, 28 Nov 2025 14:46:38 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 29 ++++
.../t/001_memcontext_inj.pl | 150 ++++++++++++++++++
.../test_memcontext_reporting.c | 12 ++
.../test_memcontext_reporting.control | 4 +
5 files changed, 196 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4c6d56d97d8..1156d731014 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..0a2dfc44f1c
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,29 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..b491d6ebc0a
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+my $psql_out;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+restart_after_crash = false
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+# Attaching to a client process's injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'error');"
+);
+
+my $pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+
+#Client should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/error triggered for injection point memcontext-client-injection/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+my $topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-server-injection', 'error');"
+);
+
+#Server should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-injection');");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test that two concurrent requests to the same process results in a warning for
+# one of those
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-client-injection', 'wait');");
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+my $psql_session1 = $node->background_psql('postgres');
+$psql_session1->query_until(
+ qr//,
+ qq(
+ SELECT pg_get_process_memory_contexts($pid, true);
+));
+$node->wait_for_event('client backend', 'memcontext-client-injection');
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+ok($psql_err =~
+ /WARNING: server process $pid is processing previous request/);
+#Wake the client up.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-client-injection');");
+
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+
+# Test the client process exiting with timeout does not break the server process
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+# Following client query times out, returning NULL as output
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stdout => \$psql_out);
+ok($psql_out = 'NULL');
+#Wakeup the server process up and detach the injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-server-wait');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+#Query the same server process again and it should succeed.
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test if the monitoring works fine, when the client backend crashes.
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'test_memcontext_reporting', 'crash', NULL);"
+);
+
+#Client will crash
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/WARNING: terminating connection because of crash of another server process|server closed the connection unexpectedly|connection to server was lost|could not send data to server/
+);
+
+# Wait till server restarts
+$node->restart;
+$node->poll_query_until('postgres', "SELECT 1;", '1');
+
+#Querying memory stats should succeed after server start
+$pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..d641f3616dc
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,12 @@
+#include "postgres.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
v46-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v46-0001-Add-function-to-report-memory-context-statistics.patchDownload
From 6b97338f59f073cadbff4d086f2a062b770f9280 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 27 Nov 2025 14:39:43 +0530
Subject: [PATCH 1/2] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 159 +++
src/backend/catalog/system_functions.sql | 14 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 1009 ++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 31 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 8 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 2 +
25 files changed, 1296 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 2896cd9e429..5eac0e5f73c 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,132 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type> <optional>,<parameter>summary</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ If the process does not respond with memory contexts statistics in 5 seconds,
+ function returns NULL.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal> (the default),
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +428,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 2d946d6d9e9..7b40bac5f57 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -657,6 +657,17 @@ LANGUAGE INTERNAL
STRICT VOLATILE PARALLEL UNSAFE
AS 'pg_replication_origin_session_setup';
+CREATE OR REPLACE FUNCTION
+ pg_get_process_memory_contexts(IN pid integer, IN summary boolean DEFAULT false,
+ OUT name text, OUT ident text, OUT type text, OUT level integer,
+ OUT path integer[], OUT total_bytes bigint, OUT total_nblocks bigint,
+ OUT free_bytes bigint, OUT free_chunks bigint, OUT used_bytes bigint,
+ OUT num_agg_contexts integer)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+STRICT VOLATILE PARALLEL UNSAFE
+AS 'pg_get_process_memory_contexts';
+
--
-- The default permissions for functions mean that anyone can execute them.
-- A number of functions shouldn't be executable by just anyone, but rather
@@ -782,6 +793,7 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
--
-- We also set up some things as accessible to standard roles.
--
@@ -808,6 +820,8 @@ GRANT EXECUTE ON FUNCTION pg_current_logfile() TO pg_monitor;
GRANT EXECUTE ON FUNCTION pg_current_logfile(text) TO pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
GRANT pg_read_all_settings TO pg_monitor;
GRANT pg_read_all_stats TO pg_monitor;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..baba657904b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 2eac8ac30d3..a22f77b3673 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -685,6 +685,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ba63b84dfc5..29454b8bf8a 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3a65d841725..b89617d78db 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..7149a67fcbc 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index e7e4d652f97..eb86648f7b7 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index adebba625e6..a68721b6f4d 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -141,6 +142,7 @@ CalculateShmemSize(void)
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
size = add_size(size, LogicalDecodingCtlShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -330,6 +332,7 @@ CreateOrAttachShmemStructs(void)
AioShmemInit();
WaitLSNShmemInit();
LogicalDecodingCtlShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b0b93d96091..ac4d08ce422 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -694,6 +694,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ebc3f4ca457..27b3b51cf2d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..e726f40dfbb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index dcfadbd5aae..4df4f8ae121 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -162,6 +162,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -406,6 +407,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 46dfb3dd133..babf8706513 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,22 +17,130 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum number of memory context statistics is calculated by dividing
+ * max memory allocated per backend with maximum size per context statistics.
+ * The identifier and name are statically allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
+
+/*
+ * Size of dshash key. The key is a uint32 rendered as a string, 10 chars
+ * plus space for a NULL terminator can hold all the values.
+ */
+#define CLIENT_KEY_SIZE (10 + 1)
+
+/* Dynamic shared memory state for reporting statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * These are used for reporting memory context statistics of a process.
+ */
+
+/* Lock to control access to client_keys array */
+static LWLock *client_keys_lock = NULL;
+
+/* Array to store the keys of MemoryStatsDsHash */
+static int *client_keys = NULL;
+
+/*
+ * Table to store pointers to DSA memory containing memory statistics and other
+ * metadata. There is one entry per client backend request, keyed by ProcNumber
+ * of the client obtained from client_keys array above.
+ */
+static dshash_table *MemoryStatsDsHash = NULL;
+
+/*
+ * Dsa area which stores the actual memory context
+ * statistics.
+ */
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static void memstats_client_key_reset(int ProcNumber);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
* MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
+ * Used for storage of transient identifiers for memory context reporting
*/
typedef struct MemoryContextId
{
@@ -143,24 +251,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +266,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +428,849 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. Additional roles can be
+ * permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends a signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give
+ * up and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Check if the server process slot is not empty and exit early Non-empty
+ * slot means some other client backend is requesting the statistics from
+ * the same server process.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] != -1)
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send DSA pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ /*
+ * The DSA pointers containing statistics for each client are stored in a
+ * dshash table. In addition to DSA pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m",
+ pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ INJECTION_POINT("memcontext-client-injection", NULL);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum,
+ path_length,
+ INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+
+/*
+ * Remove this process from the publishing process'
+ * client key slot, if the stats publishing process has failed to do so.
+ */
+static void
+memstats_client_key_reset(int procNumber)
+{
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == MyProcNumber)
+ client_keys[procNumber] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in per-process DSA
+ * pointers. These pointers are stored in a dshash table with key as requesting
+ * clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts statistics.
+ * For that we traverse the memory context tree recursively in depth first
+ * search manner to cover all the children of a parent context, to be able to
+ * display a cumulative total of memory consumption by a parent at level 2 and
+ * all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+ ResourceOwner currentOwner;
+
+ PublishMemoryContextPending = false;
+
+ INJECTION_POINT("memcontext-server-wait", NULL);
+
+ /*
+ * Retrieve the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process.
+ * Return if the client_key is -1, which means the requesting client has
+ * timed out.
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ if (client_keys[MyProcNumber] == -1)
+ {
+ LWLockRelease(client_keys_lock);
+ return;
+ }
+ else
+ {
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+ }
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL,
+ "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_process_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ currentOwner = CurrentResourceOwner;
+ CurrentResourceOwner = NULL;
+ /*
+ * The client process should have created the required DSA and DSHash
+ * table. Here we just attach to those.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+ CurrentResourceOwner = currentOwner;
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-injection", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Check if the entry has been deleted due to calling process exiting, or
+ * if the caller has timed out waiting for us and have issued a request to
+ * another backend.
+ *
+ * Make sure that the client always deletes the entry after taking
+ * required lock or this function may end up writing to unallocated
+ * memory.
+ */
+ if (!found || entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+
+ return;
+ }
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (entry->summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &c, HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+ }
+ else
+ {
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id
+ * is greater than max_stats, we stop reporting individual
+ * statistics when context_id equals max_stats - 2. As we use
+ * max_stats - 1 array slot for reporting cumulative statistics or
+ * "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Check if there are aggregated statistics or not in the result set.
+ * Statistics are individually reported when context_id <= max_stats,
+ * only if context_id > max_stats will there be aggregates.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+
+ /*
+ * The number of contexts exceeded the space available, so report the
+ * number of aggregated memory contexts
+ */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats =
+ context_id - num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for
+ * cumulative statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ }
+
+ entry->stats_written = true;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ hash_destroy(context_id_lookup);
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ char *ident = unconstify(char *, context->ident);
+ char *name = unconstify(char *, context->name);
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = unconstify(char *, context->ident);
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (namelen >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Store the path */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..92b0446b80c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index b7e94ca45bd..942ee5c34f4 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -661,6 +661,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 5c1a06d86fd..d601069b99f 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1011,6 +1011,37 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fd9448ec7b9..bef24d625d9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8617,6 +8617,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..b76f24baed6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 533344509e9..8f83c8801a7 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -137,3 +137,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 8e428f298c6..1e5f5b1f957 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 7bbe5a36959..617de0ebf91 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -19,7 +19,6 @@
#include "nodes/memnodes.h"
-
/*
* MaxAllocSize, MaxAllocHugeSize
* Quasi-arbitrary limits on size of allocations.
@@ -319,4 +318,11 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..3799ef7c862 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ceb3fc5d980..36d76262030 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1701,6 +1701,8 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
Hi,
*Error Reported in Thread [2]*: This issue has been fixed by switching
to a NULL resource owner before attaching
to DSM in the CFI handler.This error mentioned in thread [2] is triggered during CFI() call from
secure_read() when a
backend is waiting for commands and it has an open transaction which is
going to abortBelow are some details about this fix.
It is safe to temporarily set the resource owner to NULL before attaching
to the DSA
and DSHASH, since these segments are intended to be attached for the full
session
and are detached only when the session ends.
We also restore the original resource owner immediately after the attach
completes.
After further discussion and reviewing Robert's email[1]. /messages/by-id/CA+TgmoapJ6erjT21uPO12wTtoOmj6w-dp6T3qySN+NSc1cdEKw@mail.gmail.com on this topic, a
safer solution
is to avoid running ProcessGetMemoryContextInterrupt during an aborted
transaction.
This should help prevent additional errors when the transaction is already
in error handling
state. Also, reporting memory context statistics from an aborting
transaction won't
be very useful as some of that memory usage won't be valid after abort
completes.
Attached is the updated patch that addresses this.
Other possible fixes include:
1.Adjusting resource‑owner behavior
Either allow resource‑owner enlargement during release, or delay marking
it as releasing until
the abort actually begins.
Sorry, this point is invalid as resource-owner is already being marked as
releasing from
AbortTransaction.
Thank you,
Rahila Syed
[1]: . /messages/by-id/CA+TgmoapJ6erjT21uPO12wTtoOmj6w-dp6T3qySN+NSc1cdEKw@mail.gmail.com
/messages/by-id/CA+TgmoapJ6erjT21uPO12wTtoOmj6w-dp6T3qySN+NSc1cdEKw@mail.gmail.com
Show quoted text
Attachments:
v47-0002-Test-module-to-test-memory-context-reporting-wit.patchapplication/octet-stream; name=v47-0002-Test-module-to-test-memory-context-reporting-wit.patchDownload
From 83cf295b1c90fb43c7939c98f66e9aa4159d923f Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Fri, 28 Nov 2025 14:46:38 +0530
Subject: [PATCH 2/2] Test module to test memory context reporting with
injection points
---
src/test/modules/Makefile | 1 +
.../test_memcontext_reporting/Makefile | 29 ++++
.../t/001_memcontext_inj.pl | 150 ++++++++++++++++++
.../test_memcontext_reporting.c | 12 ++
.../test_memcontext_reporting.control | 4 +
5 files changed, 196 insertions(+)
create mode 100644 src/test/modules/test_memcontext_reporting/Makefile
create mode 100644 src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
create mode 100644 src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4c6d56d97d8..1156d731014 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
test_json_parser \
test_lfind \
test_lwlock_tranches \
+ test_memcontext_reporting \
test_misc \
test_oat_hooks \
test_parser \
diff --git a/src/test/modules/test_memcontext_reporting/Makefile b/src/test/modules/test_memcontext_reporting/Makefile
new file mode 100644
index 00000000000..0a2dfc44f1c
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/Makefile
@@ -0,0 +1,29 @@
+# src/test/modules/test_memcontext_reporting/Makefile
+
+EXTRA_INSTALL = src/test/modules/injection_points
+
+export enable_injection_points
+MODULE_big = test_memcontext_reporting
+OBJS = \
+ $(WIN32RES) \
+ test_memcontext_reporting.o
+PGFILEDESC = "test_memcontext_reporting - test code for memory context reporting"
+
+REGRESS = test_memcontext_reporting
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_memcontext_reporting
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+ $(prove_check)
+
+installcheck:
+ $(prove_installcheck)
diff --git a/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
new file mode 100644
index 00000000000..b491d6ebc0a
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/t/001_memcontext_inj.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Test suite for testing memory context statistics reporting
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+my $psql_err;
+my $psql_out;
+# Create and start a cluster with one node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+ 'postgresql.conf',
+ qq[
+max_connections = 100
+log_statement = none
+restart_after_crash = false
+]);
+$node->start;
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+# Attaching to a client process's injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'error');"
+);
+
+my $pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+
+#Client should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/error triggered for injection point memcontext-client-injection/);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+my $topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Attaching to a target process injection point that throws an error
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-server-injection', 'error');"
+);
+
+#Server should have thrown error
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+
+#Query the same process after detaching the injection point, using some other client and it should succeed.
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-injection');");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test that two concurrent requests to the same process results in a warning for
+# one of those
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-client-injection', 'wait');");
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+my $psql_session1 = $node->background_psql('postgres');
+$psql_session1->query_until(
+ qr//,
+ qq(
+ SELECT pg_get_process_memory_contexts($pid, true);
+));
+$node->wait_for_event('client backend', 'memcontext-client-injection');
+$node->psql(
+ 'postgres',
+ qq(select pg_get_process_memory_contexts($pid, true);),
+ stderr => \$psql_err);
+ok($psql_err =~
+ /WARNING: server process $pid is processing previous request/);
+#Wake the client up.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-client-injection');");
+
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-client-injection');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+
+# Test the client process exiting with timeout does not break the server process
+
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('memcontext-server-wait', 'wait');");
+# Following client query times out, returning NULL as output
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stdout => \$psql_out);
+ok($psql_out = 'NULL');
+#Wakeup the server process up and detach the injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_wakeup('memcontext-server-wait');");
+$node->safe_psql('postgres',
+ "select injection_points_detach('memcontext-server-wait');");
+#Query the same server process again and it should succeed.
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+# Test if the monitoring works fine, when the client backend crashes.
+$node->safe_psql('postgres',
+ "select injection_points_attach('memcontext-client-injection', 'test_memcontext_reporting', 'crash', NULL);"
+);
+
+#Client will crash
+$node->psql(
+ 'postgres',
+ qq(select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';),
+ stderr => \$psql_err);
+like($psql_err,
+ qr/WARNING: terminating connection because of crash of another server process|server closed the connection unexpectedly|connection to server was lost|could not send data to server/
+);
+
+# Wait till server restarts
+$node->restart;
+$node->poll_query_until('postgres', "SELECT 1;", '1');
+
+#Querying memory stats should succeed after server start
+$pid = $node->safe_psql('postgres',
+ "SELECT pid from pg_stat_activity where backend_type='checkpointer'");
+$topcontext_name = $node->safe_psql('postgres',
+ "select name from pg_get_process_memory_contexts($pid, true) where path = '{1}';"
+);
+ok($topcontext_name = 'TopMemoryContext');
+
+done_testing();
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
new file mode 100644
index 00000000000..d641f3616dc
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.c
@@ -0,0 +1,12 @@
+#include "postgres.h"
+#include "funcapi.h"
+
+PG_MODULE_MAGIC;
+
+extern PGDLLEXPORT void crash(const char *name, const void *private_data, void *arg);
+
+void
+crash(const char *name, const void *private_data, void *arg)
+{
+ abort();
+}
diff --git a/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
new file mode 100644
index 00000000000..48b501682d5
--- /dev/null
+++ b/src/test/modules/test_memcontext_reporting/test_memcontext_reporting.control
@@ -0,0 +1,4 @@
+comment = 'Test code for memcontext reporting'
+default_version = '1.0'
+module_pathname = '$libdir/test_memcontext_reporting'
+relocatable = true
--
2.34.1
v47-0001-Add-function-to-report-memory-context-statistics.patchapplication/octet-stream; name=v47-0001-Add-function-to-report-memory-context-statistics.patchDownload
From 781a7649bf8fb5e89fb900664f9a8d05844eb8a9 Mon Sep 17 00:00:00 2001
From: Rahila Syed <rahilasyed.90@gmail.com>
Date: Thu, 27 Nov 2025 14:39:43 +0530
Subject: [PATCH 1/2] Add function to report memory context statistics
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.
When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory. Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.
A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.
In order to not block on busy processes, we have hardcoded
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout, NULL is returned
---
doc/src/sgml/func/func-admin.sgml | 159 +++
src/backend/catalog/system_functions.sql | 14 +
src/backend/postmaster/autovacuum.c | 4 +
src/backend/postmaster/checkpointer.c | 4 +
src/backend/postmaster/interrupt.c | 4 +
src/backend/postmaster/pgarch.c | 4 +
src/backend/postmaster/startup.c | 4 +
src/backend/postmaster/walsummarizer.c | 4 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/tcop/postgres.c | 3 +
.../utils/activity/wait_event_names.txt | 2 +
src/backend/utils/adt/mcxtfuncs.c | 1012 ++++++++++++++++-
src/backend/utils/init/globals.c | 1 +
src/backend/utils/init/postinit.c | 7 +
src/backend/utils/mmgr/mcxt.c | 31 +
src/include/catalog/pg_proc.dat | 10 +
src/include/miscadmin.h | 1 +
src/include/storage/lwlocklist.h | 1 +
src/include/storage/procsignal.h | 1 +
src/include/utils/memutils.h | 8 +-
src/test/regress/expected/sysviews.out | 19 +
src/test/regress/sql/sysviews.sql | 18 +
src/tools/pgindent/typedefs.list | 2 +
25 files changed, 1299 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml
index 2896cd9e429..5eac0e5f73c 100644
--- a/doc/src/sgml/func/func-admin.sgml
+++ b/doc/src/sgml/func/func-admin.sgml
@@ -251,6 +251,132 @@
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_process_memory_contexts</primary>
+ </indexterm>
+ <function>pg_get_process_memory_contexts</function> ( <parameter>pid</parameter> <type>integer</type> <optional>,<parameter>summary</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> )
+ <returnvalue>setof record</returnvalue>
+ ( <parameter>name</parameter> <type>text</type>,
+ <parameter>ident</parameter> <type>text</type>,
+ <parameter>type</parameter> <type>text</type>,
+ <parameter>level</parameter> <type>integer</type>,
+ <parameter>path</parameter> <type>integer[]</type>,
+ <parameter>total_bytes</parameter> <type>bigint</type>,
+ <parameter>total_nblocks</parameter> <type>bigint</type>,
+ <parameter>free_bytes</parameter> <type>bigint</type>,
+ <parameter>free_chunks</parameter> <type>bigint</type>,
+ <parameter>used_bytes</parameter> <type>bigint</type>,
+ <parameter>num_agg_contexts</parameter> <type>integer</type> )
+ </para>
+ <para>
+ This function handles requests to display the memory contexts of a
+ <productname>PostgreSQL</productname> process with the specified
+ process ID. The function can be used to send requests to backends as
+ well as <glossterm linkend="glossary-auxiliary-proc">auxiliary processes</glossterm>.
+ If the process does not respond with memory contexts statistics in 5 seconds,
+ function returns NULL.
+ </para>
+ <para>
+ The returned record contains extended statistics per each memory
+ context:
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ <parameter>name</parameter> - The name of the memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>ident</parameter> - Memory context ID (if any).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>type</parameter> - The type of memory context, possible
+ values are: AllocSet, Generation, Slab and Bump.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>level</parameter> - The level in the tree of the current
+ memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>path</parameter> - Memory contexts are organized in a
+ tree model with TopMemoryContext as the root, and all other memory
+ contexts as nodes in the tree. The <parameter>path</parameter>
+ displays the path from the root to the current memory context. The
+ path is limited to 100 children per node, which each node limited
+ to a max depth of 100, to preserve memory during reporting. The
+ printed path will also be limited to 100 nodes counting from the
+ TopMemoryContext.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_bytes</parameter> - The total number of bytes
+ allocated to this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>total_nblocks</parameter> - The total number of blocks
+ used for the allocated memory.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_bytes</parameter> - The amount of free memory in
+ this memory context.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>free_chunks</parameter> - The number of chunks that
+ <parameter>free_bytes</parameter> corresponds to.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>used_bytes</parameter> - The total number of bytes
+ currently occupied.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <parameter>num_agg_contexts</parameter> - The number of memory
+ contexts aggregated in the displayed statistics.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ When <parameter>summary</parameter> is <literal>true</literal>, statistics
+ for memory contexts at levels 1 and 2 are displayed, with level 1
+ representing the root node (i.e., <literal>TopMemoryContext</literal>).
+ Statistics for contexts on level 2 and below are aggregates of all
+ child contexts' statistics, where <literal>num_agg_contexts</literal>
+ indicate the number aggregated child contexts. When
+ <parameter>summary</parameter> is <literal>false</literal> (the default),
+ <literal>the num_agg_contexts</literal> value is <literal>1</literal>,
+ indicating that individual statistics are being displayed.
+ </para>
+ <para>
+ After receiving memory context statistics from the target process, it
+ returns the results as one row per context. If all the contexts don't
+ fit within the pre-determined size limit, the remaining context
+ statistics are aggregated and a cumulative total is displayed. The
+ <literal>num_agg_contexts</literal> column indicates the number of
+ contexts aggregated in the displayed statistics. When
+ <literal>num_agg_contexts</literal> is <literal>1</literal> it means
+ that the context statistics are displayed separately.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
@@ -302,6 +428,39 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
because it may generate a large number of log messages.
</para>
+ <para>
+ <function>pg_get_process_memory_contexts</function> can be used to request
+ memory contexts statistics of any <productname>PostgreSQL</productname>
+ process. For example:
+<programlisting>
+postgres=# SELECT * FROM pg_get_process_memory_contexts(
+ (SELECT pid FROM pg_stat_activity
+ WHERE backend_type = 'checkpointer'),
+ false) LIMIT 1;
+-[ RECORD 1 ]----+------------------------------
+name | TopMemoryContext
+ident |
+type | AllocSet
+level | 1
+path | {1}
+total_bytes | 90304
+total_nblocks | 3
+free_bytes | 2880
+free_chunks | 1
+used_bytes | 87424
+num_agg_contexts | 1
+</programlisting>
+ <note>
+ <para>
+ While <function>pg_get_process_memory_contexts</function> can be used to
+ query memory contexts of the local backend,
+ <structname>pg_backend_memory_contexts</structname>
+ (see <xref linkend="view-pg-backend-memory-contexts"/> for more details)
+ will be less resource intensive when only the local backend is of interest.
+ </para>
+ </note>
+ </para>
+
</sect2>
<sect2 id="functions-admin-backup">
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index eb9e31ae1bf..d1416e5534d 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -657,6 +657,17 @@ LANGUAGE INTERNAL
STRICT VOLATILE PARALLEL UNSAFE
AS 'pg_replication_origin_session_setup';
+CREATE OR REPLACE FUNCTION
+ pg_get_process_memory_contexts(IN pid integer, IN summary boolean DEFAULT false,
+ OUT name text, OUT ident text, OUT type text, OUT level integer,
+ OUT path integer[], OUT total_bytes bigint, OUT total_nblocks bigint,
+ OUT free_bytes bigint, OUT free_chunks bigint, OUT used_bytes bigint,
+ OUT num_agg_contexts integer)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+STRICT VOLATILE PARALLEL UNSAFE
+AS 'pg_get_process_memory_contexts';
+
--
-- The default permissions for functions mean that anyone can execute them.
-- A number of functions shouldn't be executable by just anyone, but rather
@@ -782,6 +793,7 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) FROM PUBLIC;
--
-- We also set up some things as accessible to standard roles.
--
@@ -808,6 +820,8 @@ GRANT EXECUTE ON FUNCTION pg_current_logfile() TO pg_monitor;
GRANT EXECUTE ON FUNCTION pg_current_logfile(text) TO pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_get_process_memory_contexts(integer, boolean) TO pg_read_all_stats;
+
GRANT pg_read_all_settings TO pg_monitor;
GRANT pg_read_all_stats TO pg_monitor;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3e507d23cc9..fbebe506495 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -791,6 +791,10 @@ ProcessAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 6482c21b8f9..1b7d5e7ffdc 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -685,6 +685,10 @@ ProcessCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index a2c0ff012c5..cb18ce80893 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ ProcessMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 1a20387c4bd..3ac0fba225a 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -871,6 +871,10 @@ ProcessPgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index a1a4f65f9a9..18a9e7f85e1 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ ProcessStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index c3d56c866d3..49dc3e88023 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -879,6 +879,10 @@ ProcessWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Publish memory contexts of this process */
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 85c67b2c183..211199e985a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
#include "storage/sinvaladt.h"
#include "utils/guc.h"
#include "utils/injection_point.h"
+#include "utils/memutils.h"
/* GUCs */
int shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -141,6 +142,7 @@ CalculateShmemSize(void)
size = add_size(size, AioShmemSize());
size = add_size(size, WaitLSNShmemSize());
size = add_size(size, LogicalDecodingCtlShmemSize());
+ size = add_size(size, MemoryContextKeysShmemSize() + sizeof(LWLockPadded));
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
@@ -330,6 +332,7 @@ CreateOrAttachShmemStructs(void)
AioShmemInit();
WaitLSNShmemInit();
LogicalDecodingCtlShmemInit();
+ MemoryContextKeysShmemInit();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 8e56922dcea..601e01d2574 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -694,6 +694,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
HandleLogMemoryContextInterrupt();
+ if (CheckProcSignal(PROCSIG_GET_MEMORY_CONTEXT))
+ HandleGetMemoryContextInterrupt();
+
if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
HandleParallelApplyMessageInterrupt();
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 66274029c74..61957fc9b74 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e54bf1e760f..199245e4c1f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3539,6 +3539,9 @@ ProcessInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ if (PublishMemoryContextPending)
+ ProcessGetMemoryContextInterrupt();
+
if (ParallelApplyMessagePending)
ProcessParallelApplyMessages();
}
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 3299de23bb3..6f78075bdfc 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -163,6 +163,7 @@ WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit."
WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication."
WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated."
XACT_GROUP_UPDATE "Waiting for the group leader to update transaction status at transaction end."
+MEM_CXT_PUBLISH "Waiting for a process to publish memory information."
ABI_compatibility:
@@ -407,6 +408,7 @@ SubtransSLRU "Waiting to access the sub-transaction SLRU cache."
XactSLRU "Waiting to access the transaction status SLRU cache."
ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation."
AioUringCompletion "Waiting for another process to complete IO via io_uring."
+MemoryContextReportingKeys "Waiting for another process to complete reading or writing the memory reporting keys."
# No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 12b8d4cefaf..335fdcdc329 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -17,22 +17,130 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "storage/dsm_registry.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/hsearch.h"
+#include "utils/injection_point.h"
+#include "utils/memutils.h"
+#include "utils/wait_event_types.h"
+
+/*
+ * Memory Context reporting size limits.
+ */
+
+/* Max length of context name and ident, to keep it consistent
+ * with ProcessLogMemoryContext()
+ */
+#define MEMORY_CONTEXT_IDENT_SHMEM_SIZE 100
+#define MEMORY_CONTEXT_NAME_SHMEM_SIZE 100
+
+/* Maximum size (in bytes) of DSA area per process */
+#define MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND ((size_t) (1 * 1024 * 1024))
+
+/*
+ * Maximum number of memory context statistics is calculated by dividing
+ * max memory allocated per backend with maximum size per context statistics.
+ * The identifier and name are statically allocated arrays of size 100 bytes.
+ * The path depth is limited to 100 like for memory context logging.
+ */
+#define MAX_MEMORY_CONTEXT_STATS_NUM MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND / (sizeof(MemoryStatsEntry))
+
+/*
+ * Size of dshash key. The key is a uint32 rendered as a string, 10 chars
+ * plus space for a NULL terminator can hold all the values.
+ */
+#define CLIENT_KEY_SIZE (10 + 1)
+
+/* Dynamic shared memory state for reporting statistics per context */
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
+ NodeTag type;
+ int path_length;
+ int levels;
+ int64 totalspace;
+ int64 nblocks;
+ int64 freespace;
+ int64 freechunks;
+ int num_agg_stats;
+} MemoryStatsEntry;
+
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
+
+/*
+ * These are used for reporting memory context statistics of a process.
+ */
+
+/* Lock to control access to client_keys array */
+static LWLock *client_keys_lock = NULL;
+
+/* Array to store the keys of MemoryStatsDsHash */
+static int *client_keys = NULL;
+
+/*
+ * Table to store pointers to DSA memory containing memory statistics and other
+ * metadata. There is one entry per client backend request, keyed by ProcNumber
+ * of the client obtained from client_keys array above.
+ */
+static dshash_table *MemoryStatsDsHash = NULL;
+
+/*
+ * Dsa area which stores the actual memory context
+ * statistics.
+ */
+static dsa_area *MemoryStatsDsaArea = NULL;
+
+static void memstats_dsa_cleanup(char *key);
+static void memstats_client_key_reset(int ProcNumber);
+static const char *ContextTypeToString(NodeTag type);
+static void PublishMemoryContext(MemoryStatsEntry *memcxt_info,
+ int curr_id, MemoryContext context,
+ List *path,
+ MemoryContextCounters stat,
+ int num_contexts);
+static List *compute_context_path(MemoryContext c, HTAB *context_id_lookup);
/* ----------
* The max bytes for showing identifiers of MemoryContext.
+ * This is used by pg_get_backend_memory_context - view used for local backend.
* ----------
*/
#define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE 1024
+#define MAX_PATH_DISPLAY_LENGTH 100
+/* Timeout in seconds */
+#define MEMORY_STATS_MAX_TIMEOUT 5
+
/*
* MemoryContextId
- * Used for storage of transient identifiers for
- * pg_get_backend_memory_contexts.
+ * Used for storage of transient identifiers for memory context reporting
*/
typedef struct MemoryContextId
{
@@ -143,24 +251,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
else
nulls[1] = true;
- switch (context->type)
- {
- case T_AllocSetContext:
- type = "AllocSet";
- break;
- case T_GenerationContext:
- type = "Generation";
- break;
- case T_SlabContext:
- type = "Slab";
- break;
- case T_BumpContext:
- type = "Bump";
- break;
- default:
- type = "???";
- break;
- }
+ type = ContextTypeToString(context->type);
values[2] = CStringGetTextDatum(type);
values[3] = Int32GetDatum(list_length(path)); /* level */
@@ -175,6 +266,38 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
list_free(path);
}
+/*
+ * ContextTypeToString
+ * Returns a textual representation of a context type
+ *
+ * This should cover the same types as MemoryContextIsValid.
+ */
+const char *
+ContextTypeToString(NodeTag type)
+{
+ const char *context_type;
+
+ switch (type)
+ {
+ case T_AllocSetContext:
+ context_type = "AllocSet";
+ break;
+ case T_GenerationContext:
+ context_type = "Generation";
+ break;
+ case T_SlabContext:
+ context_type = "Slab";
+ break;
+ case T_BumpContext:
+ context_type = "Bump";
+ break;
+ default:
+ context_type = "???";
+ break;
+ }
+ return context_type;
+}
+
/*
* pg_get_backend_memory_contexts
* SQL SRF showing backend memory context.
@@ -305,3 +428,852 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_get_process_memory_contexts
+ * Signal a backend or an auxiliary process to send its memory contexts,
+ * wait for the results and display them.
+ *
+ * By default, only superusers or users with ROLE_PG_READ_ALL_STATS are allowed
+ * to signal a process to return the memory contexts. Additional roles can be
+ * permitted with GRANT.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to copy the memory context details
+ * to a dynamic shared memory space.
+ *
+ * We have defined a limit on DSA memory that could be allocated per process -
+ * if the process has more memory contexts than what can fit in the allocated
+ * size, the excess contexts are summarized and represented as cumulative total
+ * at the end of the buffer.
+ *
+ * After sending the signal, wait on a condition variable. The publishing
+ * backend, after copying the data to shared memory, sends a signal on that
+ * condition variable. There is one condition variable per client process.
+ * Once the condition variable is signalled, check if the latest memory context
+ * information is available and display.
+ *
+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to a predefined value MEMORY_STATS_MAX_TIMEOUT, give
+ * up and return NULL.
+ */
+Datum
+pg_get_process_memory_contexts(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ bool summary = PG_GETARG_BOOL(1);
+ PGPROC *proc;
+ ProcNumber procNumber;
+ bool proc_is_aux = false;
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryStatsEntry *memcxt_info;
+ MemoryStatsDSHashEntry *entry;
+ bool found;
+ char key[CLIENT_KEY_SIZE];
+ TimestampTz start_timestamp;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process
+ * and remember the type for when we requery the process later.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ proc_is_aux = true;
+ }
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; this is however not a problem and leave with a WARNING.
+ * See comment in pg_log_backend_memory_contexts for a discussion on this.
+ */
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ PG_RETURN_NULL();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ /*
+ * Check if the server process slot is not empty and exit early Non-empty
+ * slot means some other client backend is requesting the statistics from
+ * the same server process.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] != -1)
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Create a DSA to allocate memory for copying memory contexts statistics.
+ * Allocate the memory in the DSA and send DSA pointer to the server
+ * process for storing the context statistics. If number of contexts
+ * exceed a predefined limit (1MB), a cumulative total is stored for such
+ * contexts.
+ *
+ * The DSA is created once for the lifetime of the server, and only
+ * attached in subsequent calls.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ /*
+ * The DSA pointers containing statistics for each client are stored in a
+ * dshash table. In addition to DSA pointer, each entry in this table also
+ * contains information about the statistics, condition variable for
+ * signalling between client and the server and miscellaneous data
+ * specific to a request. There is one entry per client request in the
+ * hash table.
+ */
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, sizeof(key), "%d", MyProcNumber);
+
+ /*
+ * Insert an entry for this client in DSHASH table the first time this
+ * function is called. This entry is deleted when the process exits in
+ * before_shmem_exit call.
+ *
+ * dshash_find_or_insert locks the entry to prevent the publisher from
+ * reading before client has updated the entry.
+ */
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ if (!found)
+ {
+ entry->stats_written = false;
+ ConditionVariableInit(&entry->memcxt_cv);
+ }
+
+ /*
+ * Allocate 1MB of memory for the backend to publish its statistics on
+ * every call to this function. The memory is freed at the end of the
+ * function.
+ */
+ entry->memstats_dsa_pointer =
+ dsa_allocate0(MemoryStatsDsaArea, MEMORY_CONTEXT_REPORT_MAX_PER_BACKEND);
+
+ /*
+ * Specify whether a summary of statistics is requested, before signalling
+ * the server.
+ */
+ entry->summary = summary;
+
+ /*
+ * Indicate which server process statistics are being requested from. If
+ * this client times out before the last requested process can publish its
+ * statistics, it may send a new request to another server process. Since
+ * the previous server was notified, it might attempt to read the same
+ * client entry and respond incorrectly with its statistics. By storing
+ * the server ID in the client entry, we prevent any previously signalled
+ * server process from writing its statistics in the space meant for the
+ * newly requested process.
+ */
+ entry->target_server_id = pid;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ /*
+ * Check if the publishing process slot is empty and store this clients
+ * key i.e its procNumber. This informs the publishing process that it is
+ * supposed to write statistics in the hash entry corresponding to this
+ * client.
+ */
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ if (client_keys[procNumber] == -1)
+ client_keys[procNumber] = MyProcNumber;
+ else
+ {
+ LWLockRelease(client_keys_lock);
+ ereport(WARNING,
+ errmsg("server process %d is processing previous request",
+ pid));
+ PG_RETURN_NULL();
+ }
+ LWLockRelease(client_keys_lock);
+
+ /*
+ * Send a signal to a PostgreSQL process, informing it we want it to
+ * produce information about its memory contexts.
+ */
+ if (SendProcSignal(pid, PROCSIG_GET_MEMORY_CONTEXT, procNumber) < 0)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ereport(WARNING,
+ errmsg("could not send signal to process %d: %m",
+ pid));
+ PG_RETURN_NULL();
+ }
+ start_timestamp = GetCurrentTimestamp();
+
+ while (1)
+ {
+ long elapsed_time;
+
+ INJECTION_POINT("memcontext-client-injection", NULL);
+
+ elapsed_time = TimestampDifferenceMilliseconds(start_timestamp,
+ GetCurrentTimestamp());
+ /* Return if we have already exceeded the timeout */
+ if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable to ensure the process is still alive. Only check the
+ * relevant process type based on the earlier PID check.
+ */
+ if (proc_is_aux)
+ proc = AuxiliaryPidGetProc(pid);
+ else
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * The target server process ending during memory context processing
+ * is not an error.
+ */
+ if (proc == NULL)
+ {
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(WARNING,
+ errmsg("PID %d is no longer a PostgreSQL server process",
+ pid));
+ PG_RETURN_NULL();
+ }
+
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
+ }
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
+
+ memcxt_info = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ /*
+ * We expect to come out of sleep when the requested process has
+ * finished publishing the statistics, verified using a boolean
+ * stats_written.
+ *
+ * Make sure that the statistics are actually written by checking that
+ * the name of the context is not NULL. This is done to ensure that
+ * the subsequent waits for statistics do not return spuriously if the
+ * previous call to the function ended in error and thus could not
+ * clear the stats_written flag.
+ */
+ if (entry->stats_written && memcxt_info[0].name[0] != '\0')
+ break;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ }
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /*
+ * Backend has finished publishing the stats, project them.
+ */
+#define PG_GET_PROCESS_MEMORY_CONTEXTS_COLS 11
+ for (int i = 0; i < entry->total_stats; i++)
+ {
+ ArrayType *path_array;
+ int path_length;
+ Datum values[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_PROCESS_MEMORY_CONTEXTS_COLS];
+ Datum *path_datum = NULL;
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+
+ Assert(memcxt_info[i].name[0] != '\0');
+ values[0] = CStringGetTextDatum(memcxt_info[i].name);
+
+ if (memcxt_info[i].ident[0] != '\0')
+ values[1] = CStringGetTextDatum(memcxt_info[i].ident);
+ else
+ nulls[1] = true;
+
+ values[2] = CStringGetTextDatum(ContextTypeToString(memcxt_info[i].type));
+ values[3] = Int32GetDatum(memcxt_info[i].levels);
+
+ path_length = memcxt_info[i].path_length;
+ path_datum = (Datum *) palloc(path_length * sizeof(Datum));
+ if (memcxt_info[i].path[0] != 0)
+ {
+ for (int j = 0; j < path_length; j++)
+ path_datum[j] = Int32GetDatum(memcxt_info[i].path[j]);
+ path_array = construct_array_builtin(path_datum,
+ path_length,
+ INT4OID);
+ values[4] = PointerGetDatum(path_array);
+ }
+ else
+ nulls[4] = true;
+
+ values[5] = Int64GetDatum(memcxt_info[i].totalspace);
+ values[6] = Int64GetDatum(memcxt_info[i].nblocks);
+ values[7] = Int64GetDatum(memcxt_info[i].freespace);
+ values[8] = Int64GetDatum(memcxt_info[i].freechunks);
+ values[9] = Int64GetDatum(memcxt_info[i].totalspace -
+ memcxt_info[i].freespace);
+ values[10] = Int32GetDatum(memcxt_info[i].num_agg_stats);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ memstats_dsa_cleanup(key);
+
+ ConditionVariableCancelSleep();
+
+ PG_RETURN_NULL();
+}
+
+static void
+memstats_dsa_cleanup(char *key)
+{
+ MemoryStatsDSHashEntry *entry;
+
+ entry = dshash_find(MemoryStatsDsHash, key, true);
+
+ Assert(MemoryStatsDsaArea != NULL);
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ entry->memstats_dsa_pointer = InvalidDsaPointer;
+ entry->stats_written = false;
+ entry->target_server_id = 0;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+}
+
+/*
+ * Remove this process from the publishing process'
+ * client key slot, if the stats publishing process has failed to do so.
+ */
+static void
+memstats_client_key_reset(int procNumber)
+{
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+
+ if (client_keys[procNumber] == MyProcNumber)
+ client_keys[procNumber] = -1;
+ LWLockRelease(client_keys_lock);
+}
+
+void
+MemoryContextKeysShmemInit(void)
+{
+ bool found;
+
+ client_keys = (int *)
+ ShmemInitStruct("MemoryContextKeys",
+ MemoryContextKeysShmemSize() + sizeof(LWLockPadded), &found);
+ client_keys_lock = (LWLock *) ((char *) client_keys + MemoryContextKeysShmemSize());
+
+ if (!found)
+ {
+ MemSet(client_keys, -1, MemoryContextKeysShmemSize());
+ LWLockInitialize(client_keys_lock, LWTRANCHE_MEMORY_CONTEXT_KEYS);
+ }
+}
+
+Size
+MemoryContextKeysShmemSize(void)
+{
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz;
+}
+
+/*
+ * HandleGetMemoryContextInterrupt
+ * Handle receipt of an interrupt indicating a request to publish memory
+ * contexts statistics.
+ *
+ * All the actual work is deferred to ProcessGetMemoryContextInterrupt() as
+ * this cannot be performed in a signal handler.
+ */
+void
+HandleGetMemoryContextInterrupt(void)
+{
+ InterruptPending = true;
+ PublishMemoryContextPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessGetMemoryContextInterrupt
+ * Generate information about memory contexts used by the process.
+ *
+ * Performs a breadth first search on the memory context tree, thus parents
+ * statistics are reported before their children in the monitoring function
+ * output.
+ *
+ * Statistics for all the processes are shared via the same dynamic shared
+ * area. Individual statistics are tracked independently in per-process DSA
+ * pointers. These pointers are stored in a dshash table with key as requesting
+ * clients ProcNumber.
+ *
+ * We calculate maximum number of context's statistics that can be displayed
+ * using a pre-determined limit for memory available per process for this
+ * utility and maximum size of statistics for each context. The remaining
+ * context statistics if any are captured as a cumulative total at the end of
+ * individual context's statistics.
+ *
+ * If summary is true, we capture the level 1 and level 2 contexts statistics.
+ * For that we traverse the memory context tree recursively in depth first
+ * search manner to cover all the children of a parent context, to be able to
+ * display a cumulative total of memory consumption by a parent at level 2 and
+ * all its children.
+ */
+void
+ProcessGetMemoryContextInterrupt(void)
+{
+ List *contexts;
+ HASHCTL ctl;
+ HTAB *context_id_lookup;
+ int context_id = 0;
+ MemoryStatsEntry *meminfo;
+ MemoryContextCounters stat;
+ int num_individual_stats = 0;
+ bool found;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ int clientProcNumber;
+ MemoryContext memstats_ctx = NULL;
+ MemoryContext oldcontext = NULL;
+
+ PublishMemoryContextPending = false;
+
+ /*
+ * Avoid performing any shared memory operations in aborted transaction,
+ * the caller will get the fallback behaviour of the past known stats.
+ */
+ if (IsAbortedTransactionBlockState())
+ return;
+
+ INJECTION_POINT("memcontext-server-wait", NULL);
+
+ /*
+ * Retrieve the client key for publishing statistics and reset it to -1,
+ * so other clients can request memory statistics from this process.
+ * Return if the client_key is -1, which means the requesting client has
+ * timed out.
+ */
+ LWLockAcquire(client_keys_lock, LW_SHARED);
+ if (client_keys[MyProcNumber] == -1)
+ {
+ LWLockRelease(client_keys_lock);
+ return;
+ }
+ else
+ {
+ clientProcNumber = client_keys[MyProcNumber];
+ client_keys[MyProcNumber] = -1;
+ LWLockRelease(client_keys_lock);
+ }
+
+ /*
+ * Create a new memory context which is not a part of TopMemoryContext
+ * tree. This context is used to allocate all memory in this function.
+ * This helps in keeping the memory allocation in this function to report
+ * memory consumption statistics separate. So that it does not affect the
+ * output of this function.
+ */
+ memstats_ctx = AllocSetContextCreate((MemoryContext) NULL,
+ "publish_memory_context_statistics",
+ ALLOCSET_SMALL_SIZES);
+ oldcontext = MemoryContextSwitchTo(memstats_ctx);
+
+ /*
+ * The hash table is used for constructing "path" column of the view,
+ * similar to its local backend counterpart.
+ */
+ ctl.keysize = sizeof(MemoryContext);
+ ctl.entrysize = sizeof(MemoryContextId);
+ ctl.hcxt = CurrentMemoryContext;
+
+ context_id_lookup = hash_create("pg_get_process_memory_contexts",
+ 256,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+ /* List of contexts to process in the next round - start at the top. */
+ contexts = list_make1(TopMemoryContext);
+
+ /*
+ * The client process should have created the required DSA and DSHash
+ * table. Here we just attach to those.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea = GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash = GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);
+
+ snprintf(key, CLIENT_KEY_SIZE, "%d", clientProcNumber);
+
+ /*
+ * The entry lock is held by dshash_find_or_insert to protect writes to
+ * process specific memory. Two different processes publishing statistics
+ * do not block each other.
+ */
+ INJECTION_POINT("memcontext-server-injection", NULL);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ /*
+ * Check if the entry has been deleted due to calling process exiting, or
+ * if the caller has timed out waiting for us and have issued a request to
+ * another backend.
+ *
+ * Make sure that the client always deletes the entry after taking
+ * required lock or this function may end up writing to unallocated
+ * memory.
+ */
+ if (!found || entry->target_server_id != MyProcPid)
+ {
+ entry->stats_written = false;
+
+ dshash_release_lock(MemoryStatsDsHash, entry);
+
+ hash_destroy(context_id_lookup);
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+
+ return;
+ }
+
+ /* Should be allocated by a client backend that is requesting statistics */
+ Assert(entry->memstats_dsa_pointer != InvalidDsaPointer);
+ meminfo = (MemoryStatsEntry *)
+ dsa_get_address(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+
+ if (entry->summary)
+ {
+ int cxt_id = 0;
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ /* Copy TopMemoryContext statistics to DSA */
+ memset(&stat, 0, sizeof(stat));
+ (*TopMemoryContext->methods->stats) (TopMemoryContext, NULL, NULL,
+ &stat, true);
+ path = lcons_int(1, path);
+ PublishMemoryContext(meminfo, cxt_id, TopMemoryContext, path, stat,
+ 1);
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &TopMemoryContext,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = cxt_id + 1;
+
+ /*
+ * Copy statistics for each of TopMemoryContexts children. This
+ * includes statistics of at most 100 children per node, with each
+ * child node limited to a depth of 100 in its subtree.
+ */
+ for (MemoryContext c = TopMemoryContext->firstchild; c != NULL;
+ c = c->nextchild)
+ {
+ MemoryContextCounters grand_totals;
+ int num_contexts = 0;
+
+ path = NIL;
+ memset(&grand_totals, 0, sizeof(grand_totals));
+
+ cxt_id++;
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &c, HASH_ENTER, &found);
+ Assert(!found);
+ contextid_entry->context_id = cxt_id + 1;
+
+ MemoryContextStatsCounter(c, &grand_totals, &num_contexts);
+
+ path = compute_context_path(c, context_id_lookup);
+
+ PublishMemoryContext(meminfo, cxt_id, c, path,
+ grand_totals, num_contexts);
+ }
+ entry->total_stats = cxt_id + 1;
+ }
+ else
+ {
+ foreach_ptr(MemoryContextData, cur, contexts)
+ {
+ List *path = NIL;
+ MemoryContextId *contextid_entry;
+
+ contextid_entry = (MemoryContextId *) hash_search(context_id_lookup,
+ &cur,
+ HASH_ENTER, &found);
+ Assert(!found);
+
+ /*
+ * context id starts with 1
+ */
+ contextid_entry->context_id = context_id + 1;
+
+ /*
+ * Figure out the transient context_id of this context and each of
+ * its ancestors, to compute a path for this context.
+ */
+ path = compute_context_path(cur, context_id_lookup);
+
+ /* Examine the context stats */
+ memset(&stat, 0, sizeof(stat));
+ (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
+
+ /* Account for saving one statistics slot for cumulative reporting */
+ if (context_id < (MAX_MEMORY_CONTEXT_STATS_NUM - 1))
+ {
+ /* Copy statistics to DSA memory */
+ PublishMemoryContext(meminfo, context_id, cur, path, stat, 1);
+ }
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].totalspace += stat.totalspace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].nblocks += stat.nblocks;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freespace += stat.freespace;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].freechunks += stat.freechunks;
+ }
+
+ /*
+ * DSA max limit per process is reached, write aggregate of the
+ * remaining statistics.
+ *
+ * We can store contexts from 0 to max_stats - 1. When context_id
+ * is greater than max_stats, we stop reporting individual
+ * statistics when context_id equals max_stats - 2. As we use
+ * max_stats - 1 array slot for reporting cumulative statistics or
+ * "Remaining Totals".
+ */
+ if (context_id == (MAX_MEMORY_CONTEXT_STATS_NUM - 2))
+ {
+ int namelen = strlen("Remaining Totals");
+
+ num_individual_stats = context_id + 1;
+ strlcpy(meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].name,
+ "Remaining Totals", namelen + 1);
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].ident[0] = '\0';
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].path[0] = 0;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].type = 0;
+ }
+ context_id++;
+
+ for (MemoryContext c = cur->firstchild; c != NULL; c = c->nextchild)
+ contexts = lappend(contexts, c);
+ }
+
+ /*
+ * Check if there are aggregated statistics or not in the result set.
+ * Statistics are individually reported when context_id <= max_stats,
+ * only if context_id > max_stats will there be aggregates.
+ */
+ if (context_id <= MAX_MEMORY_CONTEXT_STATS_NUM)
+ {
+ entry->total_stats = context_id;
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats = 1;
+ }
+
+ /*
+ * The number of contexts exceeded the space available, so report the
+ * number of aggregated memory contexts
+ */
+ else
+ {
+ meminfo[MAX_MEMORY_CONTEXT_STATS_NUM - 1].num_agg_stats =
+ context_id - num_individual_stats;
+
+ /*
+ * Total stats equals num_individual_stats + 1 record for
+ * cumulative statistics.
+ */
+ entry->total_stats = num_individual_stats + 1;
+ }
+ }
+
+ entry->stats_written = true;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ hash_destroy(context_id_lookup);
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(memstats_ctx);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+}
+
+/*
+ * compute_context_path
+ *
+ * Append the transient context_id of this context and each of its ancestors
+ * to a list, in order to compute a path.
+ */
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ elog(ERROR, "hash table corrupted, can't construct path value");
+
+ path = lcons_int(cur_entry->context_id, path);
+ }
+
+ return path;
+}
+
+/*
+ * PublishMemoryContext
+ *
+ * Copy the memory context statistics of a single context to a DSA memory
+ */
+static void
+PublishMemoryContext(MemoryStatsEntry *memcxt_info, int curr_id,
+ MemoryContext context, List *path,
+ MemoryContextCounters stat, int num_contexts)
+{
+ char *ident = unconstify(char *, context->ident);
+ char *name = unconstify(char *, context->name);
+
+ /*
+ * To be consistent with logging output, we label dynahash contexts with
+ * just the hash table name as with MemoryContextStatsPrint().
+ */
+ if (context->ident && strncmp(context->name, "dynahash", 8) == 0)
+ {
+ name = unconstify(char *, context->ident);
+ ident = NULL;
+ }
+
+ if (name != NULL)
+ {
+ int namelen = strlen(name);
+
+ if (namelen >= MEMORY_CONTEXT_NAME_SHMEM_SIZE)
+ namelen = pg_mbcliplen(name, namelen,
+ MEMORY_CONTEXT_NAME_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].name, name, namelen + 1);
+ }
+ else
+ /* Clearing the array */
+ memcxt_info[curr_id].name[0] = '\0';
+
+ /* Trim and copy the identifier if it is not set to NULL */
+ if (ident != NULL)
+ {
+ int idlen = strlen(context->ident);
+
+ /*
+ * Some identifiers such as SQL query string can be very long,
+ * truncate oversize identifiers.
+ */
+ if (idlen >= MEMORY_CONTEXT_IDENT_SHMEM_SIZE)
+ idlen = pg_mbcliplen(ident, idlen,
+ MEMORY_CONTEXT_IDENT_SHMEM_SIZE - 1);
+
+ strlcpy(memcxt_info[curr_id].ident, ident, idlen + 1);
+ }
+ else
+ memcxt_info[curr_id].ident[0] = '\0';
+
+ /* Store the path */
+ if (path == NIL)
+ memcxt_info[curr_id].path[0] = 0;
+ else
+ {
+ int levels = Min(list_length(path), MAX_PATH_DISPLAY_LENGTH);
+
+ memcxt_info[curr_id].path_length = levels;
+ memcxt_info[curr_id].levels = list_length(path);
+
+ foreach_int(i, path)
+ {
+ memcxt_info[curr_id].path[foreach_current_index(i)] = i;
+ if (--levels == 0)
+ break;
+ }
+ }
+ memcxt_info[curr_id].type = context->type;
+ memcxt_info[curr_id].totalspace = stat.totalspace;
+ memcxt_info[curr_id].nblocks = stat.nblocks;
+ memcxt_info[curr_id].freespace = stat.freespace;
+ memcxt_info[curr_id].freechunks = stat.freechunks;
+ memcxt_info[curr_id].num_agg_stats = num_contexts;
+}
+
+void
+AtProcExit_memstats_cleanup(int code, Datum arg)
+{
+ int idx = MyProcNumber;
+ MemoryStatsDSHashEntry *entry;
+ char key[CLIENT_KEY_SIZE];
+ bool found;
+
+ if (MemoryStatsDsHash != NULL)
+ {
+ snprintf(key, CLIENT_KEY_SIZE, "%d", idx);
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+
+ if (found)
+ {
+ if (MemoryStatsDsaArea != NULL &&
+ DsaPointerIsValid(entry->memstats_dsa_pointer))
+ dsa_free(MemoryStatsDsaArea, entry->memstats_dsa_pointer);
+ }
+ dshash_delete_entry(MemoryStatsDsHash, entry);
+ }
+ LWLockAcquire(client_keys_lock, LW_EXCLUSIVE);
+ client_keys[idx] = -1;
+ LWLockRelease(client_keys_lock);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 36ad708b360..e246fc7ce1f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t TransactionTimeoutPending = false;
volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
+volatile sig_atomic_t PublishMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 3f401faf3de..7e6de81e7d6 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -661,6 +661,13 @@ BaseInit(void)
* drop ephemeral slots, which in turn triggers stats reporting.
*/
ReplicationSlotInitialize();
+
+ /*
+ * The before shmem exit callback frees the DSA memory occupied by the
+ * latest memory context statistics that could be published by this proc
+ * if requested.
+ */
+ before_shmem_exit(AtProcExit_memstats_cleanup, 0);
}
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 073bdb35d2a..fd75345ff5f 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1011,6 +1011,37 @@ MemoryContextStatsInternal(MemoryContext context, int level,
}
}
+
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
+
/*
* MemoryContextStatsPrint
* Print callback used by MemoryContextStatsInternal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 2ac69bf2df5..ef50383ff4f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8633,6 +8633,16 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# publishing memory contexts of the specified postgres process
+{ oid => '2173', descr => 'publish memory contexts of the specified backend',
+ proname => 'pg_get_process_memory_contexts', provolatile => 'v',
+ prorows => '100', proretset => 't', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4 bool',
+ proallargtypes => '{int4,bool,text,text,text,int4,_int4,int8,int8,int8,int8,int8,int4}',
+ proargmodes => '{i,i,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid, summary, name, ident, type, level, path, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts}',
+ prosrc => 'pg_get_process_memory_contexts' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index db559b39c4d..7e301affccc 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -96,6 +96,7 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t PublishMemoryContextPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index 94f818b9f10..ed249e72a26 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -137,3 +137,4 @@ PG_LWLOCKTRANCHE(SUBTRANS_SLRU, SubtransSLRU)
PG_LWLOCKTRANCHE(XACT_SLRU, XactSLRU)
PG_LWLOCKTRANCHE(PARALLEL_VACUUM_DSA, ParallelVacuumDSA)
PG_LWLOCKTRANCHE(AIO_URING_COMPLETION, AioUringCompletion)
+PG_LWLOCKTRANCHE(MEMORY_CONTEXT_KEYS, MemoryContextReportingKeys)
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index e52b8eb7697..4f1bcbf709f 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -35,6 +35,7 @@ typedef enum
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
+ PROCSIG_GET_MEMORY_CONTEXT, /* ask backend to send the memory contexts */
PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
/* Recovery conflict reasons */
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 2bc13c3a054..435a2a27c94 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -19,7 +19,6 @@
#include "nodes/memnodes.h"
-
/*
* MaxAllocSize, MaxAllocHugeSize
* Quasi-arbitrary limits on size of allocations.
@@ -319,4 +318,11 @@ pg_memory_is_all_zeros(const void *ptr, size_t len)
return true;
}
+extern void ProcessGetMemoryContextInterrupt(void);
+extern void HandleGetMemoryContextInterrupt(void);
+extern void MemoryContextKeysShmemInit(void);
+extern Size MemoryContextKeysShmemSize(void);
+extern void MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts);
+extern void AtProcExit_memstats_cleanup(int code, Datum arg);
#endif /* MEMUTILS_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..3799ef7c862 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -233,3 +233,22 @@ select * from pg_timezone_abbrevs where abbrev = 'LMT';
LMT | @ 7 hours 52 mins 58 secs ago | f
(1 row)
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
+NOTICE: (AllocSet,TopMemoryContext,)
+NOTICE: (AllocSet,TopMemoryContext,)
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 66179f026b3..c9da4fc8c90 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -101,3 +101,21 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
-- One specific case we can check without much fear of breakage
-- is the historical local-mean-time value used for America/Los_Angeles.
select * from pg_timezone_abbrevs where abbrev = 'LMT';
+
+DO $$
+DECLARE
+ bg_writer_pid int;
+ r RECORD;
+BEGIN
+ SELECT pid from pg_stat_activity where backend_type='background writer'
+ INTO bg_writer_pid;
+
+ select type, name, ident
+ from pg_get_process_memory_contexts(bg_writer_pid, false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+ select type, name, ident
+ from pg_get_process_memory_contexts(pg_backend_pid(), false)
+ where path = '{1}' into r;
+ RAISE NOTICE '%', r;
+END $$;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 09e7f1d420e..c32d56c0c99 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1701,6 +1701,8 @@ MemoryContextData
MemoryContextId
MemoryContextMethodID
MemoryContextMethods
+MemoryStatsEntry
+MemoryStatsDSHashEntry
MemoryStatsPrintFunc
MergeAction
MergeActionState
--
2.34.1
On 9 Jan 2026, at 12:22, Rahila Syed <rahilasyed90@gmail.com> wrote:
After further discussion and reviewing Robert's email[1] on this topic, a safer solution
is to avoid running ProcessGetMemoryContextInterrupt during an aborted transaction.
This should help prevent additional errors when the transaction is already in error handling
state. Also, reporting memory context statistics from an aborting transaction won't
be very useful as some of that memory usage won't be valid after abort completes.
+1, I think was the right call to make.
Attached is the updated patch that addresses this.
A few small comments on v47:
+static const char *ContextTypeToString(NodeTag type);
I think context_type_to_string() would be a better name on this internal
function to model it closer to the existing int_list_to_array(). Personally I
would also place it before its first use to avoid the prototype, but that's
personal preference.
+static void
+memstats_dsa_cleanup(char *key)
This function warrants a documentation comment describing when it should be
used safely.
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ PG_RETURN_NULL();
I think we should notify the user in these two timeout cases, why not adding an
ereport(NOTICE, "request for memory context statistics timed out")); or
something with a better wording than that.
+ Size sz = 0;
+ Size TotalProcs = 0;
+
+ TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
+ TotalProcs = add_size(TotalProcs, MaxBackends);
+ sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
+
+ return sz
As we discussed off-list, the call to add_size() call can be omitted as it
won't affect the calculation.
+# Copyright (c) 2025, PostgreSQL Global Development Group
Here, and possibly elsewhere, it should say 2026 instead I think.
--
Daniel Gustafsson