Add the ability to limit the amount of memory that can be allocated to backends.
Hi Hackers,
Add the ability to limit the amount of memory that can be allocated to
backends.
This builds on the work that adds backend memory allocated to
pg_stat_activity
/messages/by-id/67bb5c15c0489cb499723b0340f16e10c22485ec.camel@crunchydata.com
Both patches are attached.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit).
If unset, or set to 0 it is disabled. It is intended as a resource to
help avoid the OOM killer. A backend request that would push the total
over the limit will be denied with an out of memory error causing that
backends current query/transaction to fail. Due to the dynamic nature
of memory allocations, this limit is not exact. If within 1.5MB of the
limit and two backends request 1MB each at the same time both may be
allocated exceeding the limit. Further requests will not be allocated
until dropping below the limit. Keep this in mind when setting this
value to avoid the OOM killer. Currently, this limit does not affect
auxiliary backend processes, this list of non-affected backend
processes is open for discussion as to what should/should not be
included. Backend memory allocations are displayed in the
pg_stat_activity view.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
001-dev-max-memory.patchtext/x-patch; charset=UTF-8; name=001-dev-max-memory.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..caf958310a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would push
+ the total over the limit will be denied with an out of memory error
+ causing that backends current query/transaction to fail. Due to the dynamic
+ nature of memory allocations, this limit is not exact. If within 1.5MB of
+ the limit and two backends request 1MB each at the same time both may be
+ allocated exceeding the limit. Further requests will not be allocated until
+ dropping below the limit. Keep this in mind when setting this value. This
+ limit does not affect auxiliary backend processes
+ <xref linkend="glossary-auxiliary-proc"/> . Backend memory allocations
+ (<varname>backend_mem_allocated</varname>) are displayed in the
+ <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 269ad2fe53..808ffe75f2 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -253,6 +253,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -524,6 +528,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -718,6 +726,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 17a00587f8..9137a000ae 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -44,6 +44,8 @@
*/
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
/* exposed so that backend_progress.c can access it */
@@ -1253,3 +1255,107 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
beentry->backend_mem_allocated -= deallocation;
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
+
+/* ----------
+ * pgstat_get_all_backend_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_backend_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try to
+ * get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 backend_mem_allocated = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /* Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ backend_mem_allocated = vbeentry->backend_mem_allocated;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_backend_memory_allocated += backend_mem_allocated;
+
+ beentry++;
+ }
+
+ return all_backend_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64)max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitely identify the OOM being a result of this
+ * configuration parameter vs a system failure to allocate OOM.
+ */
+ elog(WARNING,
+ "request will exceed postgresql.conf defined max_total_backend_memory limit (%lu > %lu)",
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (uint64)max_total_bkend_mem * 1024 * 1024);
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9fbbfb1be5..ab8d83c235 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3664,6 +3664,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SIGHUP, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 90bec0502c..8e944f6511 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index c91f8efa4d..ac9a1ced3f 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -428,6 +428,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -720,6 +724,11 @@ AllocSetAlloc(MemoryContext context, Size size)
{
chunk_size = MAXALIGN(size);
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -911,6 +920,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1133,6 +1146,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 36e5b3f94d..1d5720836c 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -192,6 +192,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -361,6 +364,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -464,6 +470,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index efc8bcfaa7..5cf0cfff86 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -177,6 +177,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -331,6 +335,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 9bdc4197bd..3b940ff98e 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -270,6 +270,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -321,6 +322,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
#endif /* BACKEND_STATUS_H */
001-pg-stat-activity-backend-memory-allocated.patchtext/x-patch; charset=UTF-8; name=001-pg-stat-activity-backend-memory-allocated.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..40ae638f25 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>backend_mem_allocated</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The byte count of memory allocated to this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5a844b63a1..d23f0e9dbb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -863,6 +863,7 @@ CREATE VIEW pg_stat_activity AS
S.backend_xid,
s.backend_xmin,
S.query_id,
+ S.backend_mem_allocated,
S.query,
S.backend_type
FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index e1b90c5de4..269ad2fe53 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pgstat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +340,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pgstat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed * (only
+ * truncate supports shrinking). We update by replacing the * old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+ }
+#else
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +575,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pgstat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +630,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pgstat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +705,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pgstat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +828,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pgstat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(info.RegionSize);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +878,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pgstat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1006,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pgstat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..17a00587f8 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,8 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 backend_mem_allocated = 0;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +402,13 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /*
+ * Move sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.backend_mem_allocated = backend_mem_allocated;
+ backend_mem_allocated = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -1148,3 +1157,99 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_increase() -
+ *
+ * Called to report increase in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_increase(uint64 allocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization.
+ */
+ backend_mem_allocated += allocation;
+
+ return;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated += allocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_decrease() -
+ *
+ * Called to report decrease in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Cases may occur where shared memory from a previous postmaster
+ * invocation still exist. These are cleaned up at startup by
+ * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
+ * zero in case no corresponding prior increase exists or decrease has
+ * already been accounted for.
+ */
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization. Do not allow
+ * backend_mem_allocated to go below zero. If pgstats has not been
+ * initialized, we are in startup and we set backend_mem_allocated to
+ * zero in cases where it would go negative and skip generating an
+ * ereport.
+ */
+ if (deallocation > backend_mem_allocated)
+ backend_mem_allocated = 0;
+ else
+ backend_mem_allocated -= deallocation;
+
+ return;
+ }
+
+ /*
+ * Do not allow backend_mem_allocated to go below zero. ereport if we
+ * would have. There's no need for a lock around the read here asit's
+ * being referenced from the same backend which means that there shouldn't
+ * be concurrent writes. We want to generate an ereport in these cases.
+ */
+ if (deallocation > beentry->backend_mem_allocated)
+ {
+ ereport(LOG, (errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0")));
+
+ /*
+ * Overwrite deallocation with current backend_mem_allocated so we end
+ * up at zero.
+ */
+ deallocation = beentry->backend_mem_allocated;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated -= deallocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 4cca30aae7..1574aa8049 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -536,7 +536,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -610,6 +610,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->backend_mem_allocated);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6eeb8abab..c91f8efa4d 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -509,6 +510,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -532,6 +534,7 @@ AllocSetReset(MemoryContext context)
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -571,6 +574,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -581,6 +585,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -600,6 +605,7 @@ AllocSetDelete(MemoryContext context)
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -635,11 +641,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
}
/* Now add the just-deleted context to the freelist. */
@@ -656,7 +664,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -669,6 +680,8 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation +
+ context->mem_allocated);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -712,6 +725,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1016,6 +1031,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_backend_mem_allocated_decrease(block->endptr - ((char *) block));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1127,7 +1143,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_backend_mem_allocated_decrease(oldblksize);
set->header.mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b39894ec94..36e5b3f94d 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -258,6 +259,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -274,6 +276,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
AssertArg(GenerationIsValid(set));
@@ -296,9 +299,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -319,6 +327,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(context->mem_allocated);
+
/* And free the context header and keeper block */
free(context);
}
@@ -355,6 +366,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* block with a single (used) chunk */
block->context = set;
@@ -458,6 +470,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -691,6 +704,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_backend_mem_allocated_decrease(block->blksize);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 2d70adef09..efc8bcfaa7 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -218,6 +219,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_backend_mem_allocated_increase(headerSize);
+
return (MemoryContext) slab;
}
@@ -233,6 +240,7 @@ SlabReset(MemoryContext context)
{
int i;
SlabContext *slab = castNode(SlabContext, context);
+ uint64 deallocation = 0;
Assert(slab);
@@ -258,9 +266,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -274,8 +284,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(slab->headerSize);
+
/* And free the context header */
free(context);
}
@@ -344,6 +363,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_backend_mem_allocated_increase(slab->blockSize);
}
/* grab the block from the freelist (even the new block is there) */
@@ -511,6 +531,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_backend_mem_allocated_decrease(slab->blockSize);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index be47583122..e1bfb85b25 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5340,9 +5340,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,backend_mem_allocated}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7403bca25e..9bdc4197bd 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -168,6 +168,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 backend_mem_allocated;
} PgBackendStatus;
@@ -305,7 +308,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_report_backend_mem_allocated_increase(uint64 allocation);
+extern void pgstat_report_backend_mem_allocated_decrease(uint64 deallocation);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
/* ----------
* Support functions for the SQL-callable functions to
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7ec3d2688f..674e5c6fe7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1757,9 +1757,10 @@ pg_stat_activity| SELECT s.datid,
s.backend_xid,
s.backend_xmin,
s.query_id,
+ s.backend_mem_allocated,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1871,7 +1872,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2052,7 +2053,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2086,7 +2087,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
On Wed, Aug 31, 2022 at 12:50:19PM -0400, Reid Thompson wrote:
Hi Hackers,
Add the ability to limit the amount of memory that can be allocated to
backends.This builds on the work that adds backend memory allocated to
pg_stat_activity
/messages/by-id/67bb5c15c0489cb499723b0340f16e10c22485ec.camel@crunchydata.com
Both patches are attached.
You should name the patches with different prefixes, like
001,002,003 Otherwise, cfbot may try to apply them in the wrong order.
git format-patch is the usual tool for that.
+ Specifies a limit to the amount of memory (MB) that may be allocated to
MB are just the default unit, right ?
The user should be allowed to write max_total_backend_memory='2GB'
+ backends in total (i.e. this is not a per user or per backend limit). + If unset, or set to 0 it is disabled. A backend request that would push + the total over the limit will be denied with an out of memory error + causing that backends current query/transaction to fail. Due to the dynamic
backend's
+ nature of memory allocations, this limit is not exact. If within 1.5MB of + the limit and two backends request 1MB each at the same time both may be + allocated exceeding the limit. Further requests will not be allocated until
allocated, and exceed the limit
+bool +exceeds_max_total_bkend_mem(uint64 allocation_request) +{ + bool result = false; + + if (MyAuxProcType != NotAnAuxProcess) + return result;
The double negative is confusing, so could use a comment.
+ /* Convert max_total_bkend_mem to bytes for comparison */ + if (max_total_bkend_mem && + pgstat_get_all_backend_memory_allocated() + + allocation_request > (uint64)max_total_bkend_mem * 1024 * 1024) + { + /* + * Explicitely identify the OOM being a result of this + * configuration parameter vs a system failure to allocate OOM. + */ + elog(WARNING, + "request will exceed postgresql.conf defined max_total_backend_memory limit (%lu > %lu)", + pgstat_get_all_backend_memory_allocated() + + allocation_request, (uint64)max_total_bkend_mem * 1024 * 1024);
I think it should be ereport() rather than elog(), which is
internal-only, and not-translated.
+ {"max_total_backend_memory", PGC_SIGHUP, RESOURCES_MEM, + gettext_noop("Restrict total backend memory allocations to this max."), + gettext_noop("0 turns this feature off."), + GUC_UNIT_MB + }, + &max_total_bkend_mem, + 0, 0, INT_MAX, + NULL, NULL, NULL
I think this needs a maximum like INT_MAX/1024/1024
+uint64 +pgstat_get_all_backend_memory_allocated(void) +{
...
+ for (i = 1; i <= NumBackendStatSlots; i++) + {
It's looping over every backend for each allocation.
Do you know if there's any performance impact of that ?
I think it may be necessary to track the current allocation size in
shared memory (with atomic increments?). Maybe decrements would need to
be exactly accounted for, or otherwise Assert() that the value is not
negative. I don't know how expensive it'd be to have conditionals for
each decrement, but maybe the value would only be decremented at
strategic times, like at transaction commit or backend shutdown.
--
Justin
At Wed, 31 Aug 2022 12:50:19 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
Hi Hackers,
Add the ability to limit the amount of memory that can be allocated to
backends.
The patch seems to limit both of memory-context allocations and DSM
allocations happen on a specific process by the same budget. In the
fist place I don't think it's sensible to cap the amount of DSM
allocations by per-process budget.
DSM is used by pgstats subsystem. There can be cases where pgstat
complains for denial of DSM allocation after the budget has been
exhausted by memory-context allocations, or every command complains
for denial of memory-context allocation after once the per-process
budget is exhausted by DSM allocations. That doesn't seem reasonable.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi,
On 8/31/22 6:50 PM, Reid Thompson wrote:
Hi Hackers,
Add the ability to limit the amount of memory that can be allocated to
backends.
Thanks for the patch.
+ 1 on the idea.
Specifies a limit to the amount of memory (MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit).
If unset, or set to 0 it is disabled. It is intended as a resource to
help avoid the OOM killer. A backend request that would push the total
over the limit will be denied with an out of memory error causing that
backends current query/transaction to fail.
I'm not sure we are choosing the right victims here (aka the ones that
are doing the request that will push the total over the limit).
Imagine an extreme case where a single backend consumes say 99% of the
limit, shouldn't it be the one to be "punished"? (and somehow forced to
give the memory back).
The problem that i see with the current approach is that a "bad" backend
could impact all the others and continue to do so.
what about punishing say the highest consumer , what do you think? (just
speaking about the general idea here, not about the implementation)
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, 1 Sept 2022 at 04:52, Reid Thompson
<reid.thompson@crunchydata.com> wrote:
Add the ability to limit the amount of memory that can be allocated to
backends.
Are you aware that relcache entries are stored in backend local memory
and that once we've added a relcache entry for a relation that we have
no current code which attempts to reduce the memory consumption used
by cache entries when there's memory pressure?
It seems to me that if we had this feature as you propose that a
backend could hit the limit and stay there just from the memory
requirements of the relation cache after some number of tables have
been accessed from the given backend. It's not hard to imagine a
situation where the palloc() would start to fail during parse, which
might make it quite infuriating for anyone trying to do something
like:
SET max_total_backend_memory TO 0;
or
ALTER SYSTEM SET max_total_backend_memory TO 0;
I think a better solution to this problem would be to have "memory
grants", where we configure some amount of "pool" memory that backends
are allowed to use for queries. The planner would have to add the
expected number of work_mem that the given query is expected to use
and before that query starts, the executor would have to "checkout"
that amount of memory from the pool and return it when finished. If
there is not enough memory in the pool then the query would have to
wait until enough memory is available. This creates a deadlocking
hazard that the deadlock detector would need to be made aware of.
I know Thomas Munro has mentioned this "memory grant" or "memory pool"
feature to me previously and I think he even has some work in progress
code for it. It's a very tricky problem, however, as aside from the
deadlocking issue, it requires working out how much memory a given
plan will use concurrently. That's not as simple as counting the nodes
that use work_mem and summing those up.
There is some discussion about the feature in [1]/messages/by-id/20220713222342.GE18011@telsasoft.com. I was unable to
find what Thomas mentioned on the list about this. I've included him
here in case he has any extra information to share.
David
On Wed, 2022-08-31 at 12:34 -0500, Justin Pryzby wrote:
You should name the patches with different prefixes, like
001,002,003 Otherwise, cfbot may try to apply them in the wrong
order.
git format-patch is the usual tool for that.
Thanks for the pointer. My experience with git in the past has been
minimal and basic.
+ Specifies a limit to the amount of memory (MB) that may be
allocated toMB are just the default unit, right ?
The user should be allowed to write max_total_backend_memory='2GB'
Correct. Default units are MB. Other unit types are converted to MB.
+ causing that backends current query/transaction to fail.
backend's
+ allocated exceeding the limit. Further requests will not
allocated, and exceed the limit
+ if (MyAuxProcType != NotAnAuxProcess)
The double negative is confusing, so could use a comment.
+ elog(WARNING,
I think it should be ereport() rather than elog(), which is
internal-only, and not-translated.
Corrected/added the the above items. Attached patches with the corrections.
+ 0, 0, INT_MAX,
+ NULL, NULL, NULLI think this needs a maximum like INT_MAX/1024/1024
Is this noting that we'd set a ceiling of 2048MB?
+ for (i = 1; i <= NumBackendStatSlots; i++) + {It's looping over every backend for each allocation.
Do you know if there's any performance impact of that ?
I'm not very familiar with how to test performance impact, I'm open to
suggestions. I have performed the below pgbench tests and noted the basic
tps differences in the table.
Test 1:
branch master
CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/master --with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6 --with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap --enable-nls
make -s -j12 && make -s install
initdb
default postgresql.conf settings
init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50 testpgbench
10 iterations
for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsNoLimitSet; done
Test 2:
branch pg-stat-activity-backend-memory-allocated
CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/pg-stats-memory/ --with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6 --with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap --enable-nls
make -s -j12 && make -s install
initdb
default postgresql.conf settings
init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50
testpgbench
10 iterations
for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsPg-stats-memory; done
Test 3:
branch dev-max-memory
CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/dev-max-memory/ --with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6 --with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap --enable-nls
make -s -j12 && make -s install
initdb
default postgresql.conf settings
init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50 testpgbench
10 iterations
for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsDev-max-memory; done
Test 4:
branch dev-max-memory
CFLAGS="-I/usr/include/python3.8/ " /home/rthompso/src/git/postgres/configure --silent --prefix=/home/rthompso/src/git/postgres/install/dev-max-memory/ --with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6 --with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap --enable-nls
make -s -j12 && make -s install
initdb
non-default postgresql.conf setting for max_total_backend_memory = 100MB
init pgbench pgbench -U rthompso -p 5433 -h localhost -i -s 50 testpgbench
10 iterations
for ctr in {1..10}; do { time pgbench -p 5433 -h localhost -c 10 -j 10 -t 50000 testpgbench; } 2>&1 | tee -a pgstatsResultsDev-max-memory100MB; done
Laptop
11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz 8 Cores 16 threads
32GB RAM
SSD drive
Averages from the 10 runs and tps difference over the 10 runs
|------------------+------------------+------------------------+-------------------+------------------+-------------------+---------------+------------------|
| Test Run | Master | Track Memory Allocated | Diff from Master | Max Mem off | Diff from Master | Max Mem 100MB | Diff from Master |
| Set 1 | Test 1 | Test 2 | | Test 3 | | Test 4 | |
| latency average | 2.43390909090909 | 2.44327272727273 | | 2.44381818181818 | | 2.6843 | |
| tps inc conn est | 3398.99291372727 | 3385.40984336364 | -13.583070363637 | 3385.08184309091 | -13.9110706363631 | 3729.5363413 | 330.54342757273 |
| tps exc conn est | 3399.12185727273 | 3385.52527490909 | -13.5965823636366 | 3385.22100872727 | -13.9008485454547 | 3729.7097607 | 330.58790342727 |
|------------------+------------------+------------------------+-------------------+------------------+-------------------+---------------+------------------|
| Set 2 | | | | | | | |
| latency average | 2.691 | 2.6895 | 2 | 2.69 | 3 | 2.6827 | 4 |
| tps inc conn est | 3719.56 | 3721.7587106 | 2.1987106 | 3720.3 | .74 | 3730.86 | 11.30 |
| tps exc conn est | 3719.71 | 3721.9268465 | 2.2168465 | 3720.47 | .76 | 3731.02 | 11.31 |
|------------------+------------------+------------------------+-------------------+------------------+-------------------+---------------+------------------|
I think it may be necessary to track the current allocation size in
shared memory (with atomic increments?). Maybe decrements would need
to
be exactly accounted for, or otherwise Assert() that the value is not
negative. I don't know how expensive it'd be to have conditionals
for
each decrement, but maybe the value would only be decremented at
strategic times, like at transaction commit or backend shutdown.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 57a79b5f72af510f8c4b9ea65f5ffb4fe1fb7798 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (MB) that may be allocated to backends
in total (i.e. this is not a per user or per backend limit). If unset, or set to
0 it is disabled. It is intended as a resource to help avoid the OOM killer on
LINUX and manage resources in general. A backend request that would push the
total over the limit will be denied with an out of memory error causing that
backend's current query/transaction to fail. Due to the dynamic nature of memory
allocations, this limit is not exact. If within 1.5MB of the limit and two
backends request 1MB each at the same time both may be allocated, and exceed the
limit. Further requests will not be allocated until dropping below the limit.
Keep this in mind when setting this value. This limit does not affect auxiliary
backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 +++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 107 ++++++++++++++++++
src/backend/utils/misc/guc.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 9 ++
src/backend/utils/mmgr/slab.c | 8 ++
src/include/utils/backend_status.h | 2 +
9 files changed, 195 insertions(+)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..e70ea71ba1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>backend_mem_allocated</varname>) are displayed in
+ the <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 3356bb65b5..cc061056a3 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -253,6 +253,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -524,6 +528,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -718,6 +726,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 45da3af213..8ef9b0ffd5 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -44,6 +44,8 @@
*/
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
/* exposed so that backend_progress.c can access it */
@@ -1253,3 +1255,108 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
beentry->backend_mem_allocated -= deallocation;
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
+
+/* ----------
+ * pgstat_get_all_backend_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_backend_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try to
+ * get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 backend_mem_allocated = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /* Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ backend_mem_allocated = vbeentry->backend_mem_allocated;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_backend_memory_allocated += backend_mem_allocated;
+
+ beentry++;
+ }
+
+ return all_backend_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64)max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitely identify the OOM being a result of this
+ * configuration parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("request will exceed postgresql.conf defined max_total_backend_memory limit (%lu > %lu)",
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (uint64)max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9fbbfb1be5..ab8d83c235 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3664,6 +3664,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SIGHUP, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 90bec0502c..8e944f6511 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index c91f8efa4d..ac9a1ced3f 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -428,6 +428,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -720,6 +724,11 @@ AllocSetAlloc(MemoryContext context, Size size)
{
chunk_size = MAXALIGN(size);
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -911,6 +920,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1133,6 +1146,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 36e5b3f94d..1d5720836c 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -192,6 +192,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -361,6 +364,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -464,6 +470,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index e0e69b394e..63c07120dd 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -177,6 +177,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -331,6 +335,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 9bdc4197bd..3b940ff98e 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -270,6 +270,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -321,6 +322,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
#endif /* BACKEND_STATUS_H */
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From 584a04f1b53948049e73165a4ffdd544c950ab0d Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 12 +++
src/backend/catalog/system_views.sql | 1 +
src/backend/storage/ipc/dsm_impl.c | 80 +++++++++++++++
src/backend/utils/activity/backend_status.c | 105 ++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 18 ++++
src/backend/utils/mmgr/generation.c | 15 +++
src/backend/utils/mmgr/slab.c | 21 ++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 7 +-
src/test/regress/expected/rules.out | 9 +-
11 files changed, 269 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..40ae638f25 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>backend_mem_allocated</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The byte count of memory allocated to this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5a844b63a1..d23f0e9dbb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -863,6 +863,7 @@ CREATE VIEW pg_stat_activity AS
S.backend_xid,
s.backend_xmin,
S.query_id,
+ S.backend_mem_allocated,
S.query,
S.backend_type
FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index e1b90c5de4..3356bb65b5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +340,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+ }
+#else
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +575,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +630,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +705,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +828,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(info.RegionSize);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +878,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1006,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..45da3af213 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,8 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 backend_mem_allocated = 0;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +402,13 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /*
+ * Move sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.backend_mem_allocated = backend_mem_allocated;
+ backend_mem_allocated = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -1148,3 +1157,99 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_increase() -
+ *
+ * Called to report increase in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_increase(uint64 allocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization.
+ */
+ backend_mem_allocated += allocation;
+
+ return;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated += allocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_decrease() -
+ *
+ * Called to report decrease in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Cases may occur where shared memory from a previous postmaster
+ * invocation still exist. These are cleaned up at startup by
+ * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
+ * zero in case no corresponding prior increase exists or decrease has
+ * already been accounted for.
+ */
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization. Do not allow
+ * backend_mem_allocated to go below zero. If pgstats has not been
+ * initialized, we are in startup and we set backend_mem_allocated to
+ * zero in cases where it would go negative and skip generating an
+ * ereport.
+ */
+ if (deallocation > backend_mem_allocated)
+ backend_mem_allocated = 0;
+ else
+ backend_mem_allocated -= deallocation;
+
+ return;
+ }
+
+ /*
+ * Do not allow backend_mem_allocated to go below zero. ereport if we
+ * would have. There's no need for a lock around the read here as it's
+ * being referenced from the same backend which means that there shouldn't
+ * be concurrent writes. We want to generate an ereport in these cases.
+ */
+ if (deallocation > beentry->backend_mem_allocated)
+ {
+ ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+
+ /*
+ * Overwrite deallocation with current backend_mem_allocated so we end
+ * up at zero.
+ */
+ deallocation = beentry->backend_mem_allocated;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated -= deallocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 4cca30aae7..1574aa8049 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -536,7 +536,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -610,6 +610,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->backend_mem_allocated);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6eeb8abab..c91f8efa4d 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -509,6 +510,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -532,6 +534,7 @@ AllocSetReset(MemoryContext context)
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -571,6 +574,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -581,6 +585,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -600,6 +605,7 @@ AllocSetDelete(MemoryContext context)
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -635,11 +641,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
}
/* Now add the just-deleted context to the freelist. */
@@ -656,7 +664,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -669,6 +680,8 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation +
+ context->mem_allocated);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -712,6 +725,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1016,6 +1031,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_backend_mem_allocated_decrease(block->endptr - ((char *) block));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1127,7 +1143,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_backend_mem_allocated_decrease(oldblksize);
set->header.mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b39894ec94..36e5b3f94d 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -258,6 +259,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -274,6 +276,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
AssertArg(GenerationIsValid(set));
@@ -296,9 +299,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -319,6 +327,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(context->mem_allocated);
+
/* And free the context header and keeper block */
free(context);
}
@@ -355,6 +366,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* block with a single (used) chunk */
block->context = set;
@@ -458,6 +470,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -691,6 +704,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_backend_mem_allocated_decrease(block->blksize);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 2d70adef09..e0e69b394e 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -218,6 +219,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_backend_mem_allocated_increase(headerSize);
+
return (MemoryContext) slab;
}
@@ -233,6 +240,7 @@ SlabReset(MemoryContext context)
{
int i;
SlabContext *slab = castNode(SlabContext, context);
+ uint64 deallocation = 0;
Assert(slab);
@@ -258,9 +266,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -274,8 +284,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated, cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(slab->headerSize);
+
/* And free the context header */
free(context);
}
@@ -344,6 +363,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_backend_mem_allocated_increase(slab->blockSize);
}
/* grab the block from the freelist (even the new block is there) */
@@ -511,6 +531,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_backend_mem_allocated_decrease(slab->blockSize);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index be47583122..e1bfb85b25 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5340,9 +5340,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,backend_mem_allocated}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7403bca25e..9bdc4197bd 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -168,6 +168,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 backend_mem_allocated;
} PgBackendStatus;
@@ -305,7 +308,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_report_backend_mem_allocated_increase(uint64 allocation);
+extern void pgstat_report_backend_mem_allocated_decrease(uint64 deallocation);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
/* ----------
* Support functions for the SQL-callable functions to
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7ec3d2688f..674e5c6fe7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1757,9 +1757,10 @@ pg_stat_activity| SELECT s.datid,
s.backend_xid,
s.backend_xmin,
s.query_id,
+ s.backend_mem_allocated,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1871,7 +1872,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2052,7 +2053,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2086,7 +2087,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
--
2.25.1
On Thu, 2022-09-01 at 11:48 +0900, Kyotaro Horiguchi wrote:
The patch seems to limit both of memory-context allocations and DSM
allocations happen on a specific process by the same budget. In the
fist place I don't think it's sensible to cap the amount of DSM
allocations by per-process budget.DSM is used by pgstats subsystem. There can be cases where pgstat
complains for denial of DSM allocation after the budget has been
exhausted by memory-context allocations, or every command complains
for denial of memory-context allocation after once the per-process
budget is exhausted by DSM allocations. That doesn't seem
reasonable.
regards.
It's intended as a mechanism for administrators to limit total
postgresql memory consumption to avoid the OOM killer causing a crash
and restart, or to ensure that resources are available for other
processes on shared hosts, etc. It limits all types of allocations in
order to accomplish this. Our documentation will note this, so that
administrators that have the need to set it are aware that it can
affect all non-auxiliary processes and what the effect is.
On Fri, 2022-09-02 at 09:30 +0200, Drouvot, Bertrand wrote:
Hi,
I'm not sure we are choosing the right victims here (aka the ones
that are doing the request that will push the total over the limit).Imagine an extreme case where a single backend consumes say 99% of
the limit, shouldn't it be the one to be "punished"? (and somehow forced
to give the memory back).The problem that i see with the current approach is that a "bad"
backend could impact all the others and continue to do so.what about punishing say the highest consumer , what do you think?
(just speaking about the general idea here, not about the implementation)
Initially, we believe that punishing the detector is reasonable if we
can help administrators avoid the OOM killer/resource starvation. But
we can and should expand on this idea.
Another thought is, rather than just failing the query/transaction we
have the affected backend do a clean exit, freeing all it's resources.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Greetings,
* David Rowley (dgrowleyml@gmail.com) wrote:
On Thu, 1 Sept 2022 at 04:52, Reid Thompson
<reid.thompson@crunchydata.com> wrote:Add the ability to limit the amount of memory that can be allocated to
backends.Are you aware that relcache entries are stored in backend local memory
and that once we've added a relcache entry for a relation that we have
no current code which attempts to reduce the memory consumption used
by cache entries when there's memory pressure?
Short answer to this is yes, and that's an issue, but it isn't this
patch's problem to deal with- that's an issue that the relcache system
needs to be changed to address.
It seems to me that if we had this feature as you propose that a
backend could hit the limit and stay there just from the memory
requirements of the relation cache after some number of tables have
been accessed from the given backend. It's not hard to imagine a
situation where the palloc() would start to fail during parse, which
might make it quite infuriating for anyone trying to do something
like:
Agreed that this could happen but I don't imagine it to be super likely-
and even if it does, this is probably a better position to be in as the
backend could then be disconnected from and would then go away and its
memory free'd, unlike the current OOM-killer situation where we crash
and go through recovery. We should note this in the documentation
though, sure, so that administrators understand how this can occur and
can take action to address it.
I think a better solution to this problem would be to have "memory
grants", where we configure some amount of "pool" memory that backends
are allowed to use for queries. The planner would have to add the
expected number of work_mem that the given query is expected to use
and before that query starts, the executor would have to "checkout"
that amount of memory from the pool and return it when finished. If
there is not enough memory in the pool then the query would have to
wait until enough memory is available. This creates a deadlocking
hazard that the deadlock detector would need to be made aware of.
Sure, that also sounds great and a query acceptance system would be
wonderful. If someone is working on that with an expectation of it
landing before v16, great. Otherwise, I don't see it as relevant to
the question about if we should include this feature or not, and I'm not
even sure that we'd refuse this feature even if we already had an
acceptance system as a stop-gap should we guess wrong and not realize it
until it's too late.
Thanks,
Stephen
On Sat, Sep 03, 2022 at 11:40:03PM -0400, Reid Thompson wrote:
+�������������� 0, 0, INT_MAX, +�������������� NULL, NULL, NULLI think this needs a maximum like INT_MAX/1024/1024
Is this noting that we'd set a ceiling of 2048MB?
The reason is that you're later multiplying it by 1024*1024, so you need
to limit it to avoid overflowing. Compare with
min_dynamic_shared_memory, Log_RotationSize, maintenance_work_mem,
autovacuum_work_mem.
typo: Explicitely
+ errmsg("request will exceed postgresql.conf defined max_total_backend_memory limit (%lu > %lu)",
I wouldn't mention postgresql.conf - it could be in
postgresql.auto.conf, or an include file, or a -c parameter.
Suggest: allocation would exceed max_total_backend_memory limit...
+ ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
Suggest: deallocation would decrease backend memory below zero;
+ {"max_total_backend_memory", PGC_SIGHUP, RESOURCES_MEM,
Should this be PGC_SU_BACKEND to allow a superuser to set a higher
limit (or no limit)?
There's compilation warning under mingw cross compile due to
sizeof(long). See d914eb347 and other recent commits which I guess is
the current way to handle this.
http://cfbot.cputube.org/reid-thompson.html
For performance test, you'd want to check what happens with a large
number of max_connections (and maybe a large number of clients). TPS
isn't the only thing that matters. For example, a utility command might
sometimes do a lot of allocations (or deallocations), or a
"parameterized nested loop" may loop over over many outer tuples and
reset for each. There's also a lot of places that reset to a
"per-tuple" context. I started looking at its performance, but nothing
to show yet.
Would you keep people copied on your replies ("reply all") ? Otherwise
I (at least) may miss them. I think that's what's typical on these
lists (and the list tool is smart enough not to send duplicates to
people who are direct recipients).
--
Justin
On Fri, 2022-09-09 at 12:14 -0500, Justin Pryzby wrote:
On Sat, Sep 03, 2022 at 11:40:03PM -0400, Reid Thompson wrote:
+ 0, 0, INT_MAX,
+ NULL, NULL, NULLI think this needs a maximum like INT_MAX/1024/1024
Is this noting that we'd set a ceiling of 2048MB?
The reason is that you're later multiplying it by 1024*1024, so you
need
to limit it to avoid overflowing. Compare with
min_dynamic_shared_memory, Log_RotationSize, maintenance_work_mem,
autovacuum_work_mem.
What I originally attempted to implement is:
GUC "max_total_backend_memory" max value as INT_MAX = 2147483647 MB
(2251799812636672 bytes). And the other variables and comparisons as
bytes represented as uint64 to avoid overflow.
Is this invalid?
typo: Explicitely
corrected
+ errmsg("request will exceed postgresql.conf
defined max_total_backend_memory limit (%lu > %lu)",I wouldn't mention postgresql.conf - it could be in
postgresql.auto.conf, or an include file, or a -c parameter.
Suggest: allocation would exceed max_total_backend_memory limit...
updated
+ ereport(LOG, errmsg("decrease reduces reported
backend memory allocated below zero; setting reported to 0"));Suggest: deallocation would decrease backend memory below zero;
updated
+ {"max_total_backend_memory", PGC_SIGHUP,
RESOURCES_MEM,
Should this be PGC_SU_BACKEND to allow a superuser to set a higher
limit (or no limit)?
Sounds good to me. I'll update to that.
Would PGC_SUSET be too open?
There's compilation warning under mingw cross compile due to
sizeof(long). See d914eb347 and other recent commits which I guess
is
the current way to handle this.
http://cfbot.cputube.org/reid-thompson.html
updated %lu to %llu and changed cast from uint64 to
unsigned long long in the ereport call
For performance test, you'd want to check what happens with a large
number of max_connections (and maybe a large number of clients). TPS
isn't the only thing that matters. For example, a utility command
might
sometimes do a lot of allocations (or deallocations), or a
"parameterized nested loop" may loop over over many outer tuples and
reset for each. There's also a lot of places that reset to a
"per-tuple" context. I started looking at its performance, but
nothing
to show yet.
Thanks
Would you keep people copied on your replies ("reply all") ?
Otherwise
I (at least) may miss them. I think that's what's typical on these
lists (and the list tool is smart enough not to send duplicates to
people who are direct recipients).
Ok - will do, thanks.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From 584a04f1b53948049e73165a4ffdd544c950ab0d Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 12 +++
src/backend/catalog/system_views.sql | 1 +
src/backend/storage/ipc/dsm_impl.c | 80 +++++++++++++++
src/backend/utils/activity/backend_status.c | 105 ++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 18 ++++
src/backend/utils/mmgr/generation.c | 15 +++
src/backend/utils/mmgr/slab.c | 21 ++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 7 +-
src/test/regress/expected/rules.out | 9 +-
11 files changed, 269 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..40ae638f25 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>backend_mem_allocated</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The byte count of memory allocated to this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5a844b63a1..d23f0e9dbb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -863,6 +863,7 @@ CREATE VIEW pg_stat_activity AS
S.backend_xid,
s.backend_xmin,
S.query_id,
+ S.backend_mem_allocated,
S.query,
S.backend_type
FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index e1b90c5de4..3356bb65b5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +340,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+ }
+#else
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +575,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +630,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +705,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +828,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(info.RegionSize);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +878,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1006,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..45da3af213 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,8 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 backend_mem_allocated = 0;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +402,13 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /*
+ * Move sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.backend_mem_allocated = backend_mem_allocated;
+ backend_mem_allocated = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -1148,3 +1157,99 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_increase() -
+ *
+ * Called to report increase in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_increase(uint64 allocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization.
+ */
+ backend_mem_allocated += allocation;
+
+ return;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated += allocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_decrease() -
+ *
+ * Called to report decrease in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Cases may occur where shared memory from a previous postmaster
+ * invocation still exist. These are cleaned up at startup by
+ * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
+ * zero in case no corresponding prior increase exists or decrease has
+ * already been accounted for.
+ */
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization. Do not allow
+ * backend_mem_allocated to go below zero. If pgstats has not been
+ * initialized, we are in startup and we set backend_mem_allocated to
+ * zero in cases where it would go negative and skip generating an
+ * ereport.
+ */
+ if (deallocation > backend_mem_allocated)
+ backend_mem_allocated = 0;
+ else
+ backend_mem_allocated -= deallocation;
+
+ return;
+ }
+
+ /*
+ * Do not allow backend_mem_allocated to go below zero. ereport if we
+ * would have. There's no need for a lock around the read here as it's
+ * being referenced from the same backend which means that there shouldn't
+ * be concurrent writes. We want to generate an ereport in these cases.
+ */
+ if (deallocation > beentry->backend_mem_allocated)
+ {
+ ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+
+ /*
+ * Overwrite deallocation with current backend_mem_allocated so we end
+ * up at zero.
+ */
+ deallocation = beentry->backend_mem_allocated;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated -= deallocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 4cca30aae7..1574aa8049 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -536,7 +536,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -610,6 +610,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->backend_mem_allocated);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6eeb8abab..c91f8efa4d 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -509,6 +510,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -532,6 +534,7 @@ AllocSetReset(MemoryContext context)
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -571,6 +574,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -581,6 +585,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -600,6 +605,7 @@ AllocSetDelete(MemoryContext context)
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -635,11 +641,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
}
/* Now add the just-deleted context to the freelist. */
@@ -656,7 +664,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -669,6 +680,8 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation +
+ context->mem_allocated);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -712,6 +725,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1016,6 +1031,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_backend_mem_allocated_decrease(block->endptr - ((char *) block));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1127,7 +1143,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_backend_mem_allocated_decrease(oldblksize);
set->header.mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b39894ec94..36e5b3f94d 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -258,6 +259,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -274,6 +276,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
AssertArg(GenerationIsValid(set));
@@ -296,9 +299,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -319,6 +327,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(context->mem_allocated);
+
/* And free the context header and keeper block */
free(context);
}
@@ -355,6 +366,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* block with a single (used) chunk */
block->context = set;
@@ -458,6 +470,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -691,6 +704,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_backend_mem_allocated_decrease(block->blksize);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 2d70adef09..e0e69b394e 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -218,6 +219,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_backend_mem_allocated_increase(headerSize);
+
return (MemoryContext) slab;
}
@@ -233,6 +240,7 @@ SlabReset(MemoryContext context)
{
int i;
SlabContext *slab = castNode(SlabContext, context);
+ uint64 deallocation = 0;
Assert(slab);
@@ -258,9 +266,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -274,8 +284,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated, cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(slab->headerSize);
+
/* And free the context header */
free(context);
}
@@ -344,6 +363,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_backend_mem_allocated_increase(slab->blockSize);
}
/* grab the block from the freelist (even the new block is there) */
@@ -511,6 +531,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_backend_mem_allocated_decrease(slab->blockSize);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index be47583122..e1bfb85b25 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5340,9 +5340,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,backend_mem_allocated}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7403bca25e..9bdc4197bd 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -168,6 +168,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 backend_mem_allocated;
} PgBackendStatus;
@@ -305,7 +308,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_report_backend_mem_allocated_increase(uint64 allocation);
+extern void pgstat_report_backend_mem_allocated_decrease(uint64 deallocation);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
/* ----------
* Support functions for the SQL-callable functions to
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7ec3d2688f..674e5c6fe7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1757,9 +1757,10 @@ pg_stat_activity| SELECT s.datid,
s.backend_xid,
s.backend_xmin,
s.query_id,
+ s.backend_mem_allocated,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1871,7 +1872,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2052,7 +2053,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2086,7 +2087,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
--
2.25.1
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 23d33838e3780ef02daad2bc737290d9905745a7 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 +++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 109 +++++++++++++++++-
src/backend/utils/misc/guc.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 9 ++
src/backend/utils/mmgr/slab.c | 8 ++
src/include/utils/backend_status.h | 2 +
9 files changed, 196 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..e70ea71ba1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>backend_mem_allocated</varname>) are displayed in
+ the <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 3356bb65b5..cc061056a3 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -253,6 +253,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -524,6 +528,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -718,6 +726,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 45da3af213..9c2fc2f07c 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -44,6 +44,8 @@
*/
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
/* exposed so that backend_progress.c can access it */
@@ -1235,7 +1237,7 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
*/
if (deallocation > beentry->backend_mem_allocated)
{
- ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+ ereport(LOG, errmsg("deallocation would decrease backend memory below zero; setting reported to 0"));
/*
* Overwrite deallocation with current backend_mem_allocated so we end
@@ -1253,3 +1255,108 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
beentry->backend_mem_allocated -= deallocation;
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
+
+/* ----------
+ * pgstat_get_all_backend_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_backend_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try to
+ * get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 backend_mem_allocated = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /* Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ backend_mem_allocated = vbeentry->backend_mem_allocated;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_backend_memory_allocated += backend_mem_allocated;
+
+ beentry++;
+ }
+
+ return all_backend_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64)max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_backend_memory limit (%llu > %llu)",
+ (unsigned long long)pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long)max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9fbbfb1be5..4e256a77d5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3664,6 +3664,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 90bec0502c..8e944f6511 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index c91f8efa4d..ac9a1ced3f 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -428,6 +428,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -720,6 +724,11 @@ AllocSetAlloc(MemoryContext context, Size size)
{
chunk_size = MAXALIGN(size);
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -911,6 +920,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1133,6 +1146,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 36e5b3f94d..1d5720836c 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -192,6 +192,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -361,6 +364,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -464,6 +470,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index e0e69b394e..63c07120dd 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -177,6 +177,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -331,6 +335,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 9bdc4197bd..3b940ff98e 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -270,6 +270,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -321,6 +322,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
#endif /* BACKEND_STATUS_H */
--
2.25.1
On Mon, Sep 12, 2022 at 8:30 PM Reid Thompson <reid.thompson@crunchydata.com>
wrote:
On Fri, 2022-09-09 at 12:14 -0500, Justin Pryzby wrote:
On Sat, Sep 03, 2022 at 11:40:03PM -0400, Reid Thompson wrote:
+ 0, 0, INT_MAX,
+ NULL, NULL, NULLI think this needs a maximum like INT_MAX/1024/1024
Is this noting that we'd set a ceiling of 2048MB?
The reason is that you're later multiplying it by 1024*1024, so you
need
to limit it to avoid overflowing. Compare with
min_dynamic_shared_memory, Log_RotationSize, maintenance_work_mem,
autovacuum_work_mem.What I originally attempted to implement is:
GUC "max_total_backend_memory" max value as INT_MAX = 2147483647 MB
(2251799812636672 bytes). And the other variables and comparisons as
bytes represented as uint64 to avoid overflow.Is this invalid?
typo: Explicitely
corrected
+ errmsg("request will exceed postgresql.conf
defined max_total_backend_memory limit (%lu > %lu)",I wouldn't mention postgresql.conf - it could be in
postgresql.auto.conf, or an include file, or a -c parameter.
Suggest: allocation would exceed max_total_backend_memory limit...updated
+ ereport(LOG, errmsg("decrease reduces reported
backend memory allocated below zero; setting reported to 0"));Suggest: deallocation would decrease backend memory below zero;
updated
+ {"max_total_backend_memory", PGC_SIGHUP,
RESOURCES_MEM,Should this be PGC_SU_BACKEND to allow a superuser to set a higher
limit (or no limit)?Sounds good to me. I'll update to that.
Would PGC_SUSET be too open?There's compilation warning under mingw cross compile due to
sizeof(long). See d914eb347 and other recent commits which I guess
is
the current way to handle this.
http://cfbot.cputube.org/reid-thompson.htmlupdated %lu to %llu and changed cast from uint64 to
unsigned long long in the ereport callFor performance test, you'd want to check what happens with a large
number of max_connections (and maybe a large number of clients). TPS
isn't the only thing that matters. For example, a utility command
might
sometimes do a lot of allocations (or deallocations), or a
"parameterized nested loop" may loop over over many outer tuples and
reset for each. There's also a lot of places that reset to a
"per-tuple" context. I started looking at its performance, but
nothing
to show yet.Thanks
Would you keep people copied on your replies ("reply all") ?
Otherwise
I (at least) may miss them. I think that's what's typical on these
lists (and the list tool is smart enough not to send duplicates to
people who are direct recipients).Ok - will do, thanks.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.reid.thompson@crunchydata.com
www.crunchydata.comThe patch does not apply; please rebase the patch.
patching file src/backend/utils/misc/guc.c
Hunk #1 FAILED at 3664.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/utils/misc/guc.c.rej
patching file src/backend/utils/misc/postgresql.conf.sample
--
Ibrar Ahmed
On Thu, 2022-09-15 at 12:07 +0400, Ibrar Ahmed wrote:
The patch does not apply; please rebase the patch.
patching file src/backend/utils/misc/guc.c
Hunk #1 FAILED at 3664.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/utils/misc/guc.c.rejpatching file src/backend/utils/misc/postgresql.conf.sample
rebased patches attached.
Thanks,
Reid
Attachments:
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From 0e7010c53508d5a396edd16fd9166abe431f5dbe Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 12 +++
src/backend/catalog/system_views.sql | 1 +
src/backend/storage/ipc/dsm_impl.c | 80 +++++++++++++++
src/backend/utils/activity/backend_status.c | 105 ++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 18 ++++
src/backend/utils/mmgr/generation.c | 15 +++
src/backend/utils/mmgr/slab.c | 21 ++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 7 +-
src/test/regress/expected/rules.out | 9 +-
11 files changed, 269 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..40ae638f25 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>backend_mem_allocated</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The byte count of memory allocated to this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f7ec79e0..a78750ab12 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -862,6 +862,7 @@ CREATE VIEW pg_stat_activity AS
S.backend_xid,
s.backend_xmin,
S.query_id,
+ S.backend_mem_allocated,
S.query,
S.backend_type
FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index e1b90c5de4..3356bb65b5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +340,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+ }
+#else
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +575,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +630,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +705,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +828,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(info.RegionSize);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +878,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1006,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..45da3af213 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,8 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 backend_mem_allocated = 0;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +402,13 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /*
+ * Move sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.backend_mem_allocated = backend_mem_allocated;
+ backend_mem_allocated = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -1148,3 +1157,99 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_increase() -
+ *
+ * Called to report increase in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_increase(uint64 allocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization.
+ */
+ backend_mem_allocated += allocation;
+
+ return;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated += allocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_decrease() -
+ *
+ * Called to report decrease in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Cases may occur where shared memory from a previous postmaster
+ * invocation still exist. These are cleaned up at startup by
+ * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
+ * zero in case no corresponding prior increase exists or decrease has
+ * already been accounted for.
+ */
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization. Do not allow
+ * backend_mem_allocated to go below zero. If pgstats has not been
+ * initialized, we are in startup and we set backend_mem_allocated to
+ * zero in cases where it would go negative and skip generating an
+ * ereport.
+ */
+ if (deallocation > backend_mem_allocated)
+ backend_mem_allocated = 0;
+ else
+ backend_mem_allocated -= deallocation;
+
+ return;
+ }
+
+ /*
+ * Do not allow backend_mem_allocated to go below zero. ereport if we
+ * would have. There's no need for a lock around the read here as it's
+ * being referenced from the same backend which means that there shouldn't
+ * be concurrent writes. We want to generate an ereport in these cases.
+ */
+ if (deallocation > beentry->backend_mem_allocated)
+ {
+ ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+
+ /*
+ * Overwrite deallocation with current backend_mem_allocated so we end
+ * up at zero.
+ */
+ deallocation = beentry->backend_mem_allocated;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated -= deallocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index be15b4b2e5..35d497a12e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -536,7 +536,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -610,6 +610,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->backend_mem_allocated);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index ec423375ae..2ac6f10e12 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -509,6 +510,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -532,6 +534,7 @@ AllocSetReset(MemoryContext context)
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -571,6 +574,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -581,6 +585,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -600,6 +605,7 @@ AllocSetDelete(MemoryContext context)
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY
= set->keeper->endptr - ((char *) set);
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -635,11 +641,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
}
/* Now add the just-deleted context to the freelist. */
@@ -656,7 +664,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -669,6 +680,8 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation +
+ context->mem_allocated);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -718,6 +731,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -928,6 +942,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1028,6 +1043,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_backend_mem_allocated_decrease(block->endptr - ((char *) block));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1144,7 +1160,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_backend_mem_allocated_decrease(oldblksize);
set->header.mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index c743b24fa7..34b11392ff 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -258,6 +259,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -274,6 +276,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
AssertArg(GenerationIsValid(set));
@@ -296,9 +299,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -319,6 +327,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(context->mem_allocated);
+
/* And free the context header and keeper block */
free(context);
}
@@ -363,6 +374,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* block with a single (used) chunk */
block->context = set;
@@ -466,6 +478,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -699,6 +712,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_backend_mem_allocated_decrease(block->blksize);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 9149aaafcb..72376da82e 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -223,6 +224,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_backend_mem_allocated_increase(headerSize);
+
return (MemoryContext) slab;
}
@@ -238,6 +245,7 @@ SlabReset(MemoryContext context)
{
int i;
SlabContext *slab = castNode(SlabContext, context);
+ uint64 deallocation = 0;
Assert(slab);
@@ -263,9 +271,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -279,8 +289,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated, cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(slab->headerSize);
+
/* And free the context header */
free(context);
}
@@ -349,6 +368,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_backend_mem_allocated_increase(slab->blockSize);
}
/* grab the block from the freelist (even the new block is there) */
@@ -514,6 +534,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_backend_mem_allocated_decrease(slab->blockSize);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a07e737a33..363d92e9f2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5340,9 +5340,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,backend_mem_allocated}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7403bca25e..9bdc4197bd 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -168,6 +168,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 backend_mem_allocated;
} PgBackendStatus;
@@ -305,7 +308,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_report_backend_mem_allocated_increase(uint64 allocation);
+extern void pgstat_report_backend_mem_allocated_decrease(uint64 deallocation);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
/* ----------
* Support functions for the SQL-callable functions to
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9dd137415e..4588d71c8a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1752,9 +1752,10 @@ pg_stat_activity| SELECT s.datid,
s.backend_xid,
s.backend_xmin,
s.query_id,
+ s.backend_mem_allocated,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1866,7 +1867,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2047,7 +2048,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2081,7 +2082,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
--
2.25.1
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From bb8e1a2bbb9425eb7667a97dc49beebe8f9bf327 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 ++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 111 +++++++++++++++++-
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 11 +-
src/backend/utils/mmgr/slab.c | 8 ++
src/include/utils/backend_status.h | 2 +
9 files changed, 199 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 700914684d..ce8e35daee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>backend_mem_allocated</varname>) are displayed in
+ the <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 3356bb65b5..cc061056a3 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -253,6 +253,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -524,6 +528,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -718,6 +726,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 45da3af213..7820c55489 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,9 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
+
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -1235,7 +1238,7 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
*/
if (deallocation > beentry->backend_mem_allocated)
{
- ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+ ereport(LOG, errmsg("deallocation would decrease backend memory below zero; setting reported to 0"));
/*
* Overwrite deallocation with current backend_mem_allocated so we end
@@ -1253,3 +1256,109 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
beentry->backend_mem_allocated -= deallocation;
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
+
+/* ----------
+ * pgstat_get_all_backend_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_backend_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try
+ * to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 backend_mem_allocated = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /*
+ * Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ backend_mem_allocated = vbeentry->backend_mem_allocated;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_backend_memory_allocated += backend_mem_allocated;
+
+ beentry++;
+ }
+
+ return all_backend_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_backend_memory limit (%llu > %llu)",
+ (unsigned long long) pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 550e95056c..7d4dc8677a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3415,6 +3415,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2ae76e5cfb..8a0b383eb7 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 2ac6f10e12..509b93faa1 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -428,6 +428,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -726,6 +730,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -923,6 +932,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1150,6 +1163,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 34b11392ff..6256e84d48 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -192,6 +192,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -276,7 +279,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
- uint64 deallocation = 0;
+ uint64 deallocation = 0;
AssertArg(GenerationIsValid(set));
@@ -369,6 +372,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -472,6 +478,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 72376da82e..364c9eb795 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -182,6 +182,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -336,6 +340,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 9bdc4197bd..3b940ff98e 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -270,6 +270,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -321,6 +322,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
#endif /* BACKEND_STATUS_H */
--
2.25.1
Hello Reid,
could you rebase the patch again? It doesn't apply currently (http://cfbot.cputube.org/patch_40_3867.log). Thanks!
You mention, that you want to prevent the compiler from getting cute.
I don't think this comments are exactly helpful in the current state. I think probably fine to just omit them.
I don't understand the purpose of the result variable in exceeds_max_total_bkend_mem. What purpose does it serve?
I really like the simplicity of the suggestion here to prevent oom.
I intent to play around with a lot of backends, once I get a rebased patch.
Regards
Arne
________________________________
From: Reid Thompson <reid.thompson@crunchydata.com>
Sent: Thursday, September 15, 2022 4:58:19 PM
To: Ibrar Ahmed; pgsql-hackers@lists.postgresql.org
Cc: reid.thompson@crunchydata.com; Justin Pryzby
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Thu, 2022-09-15 at 12:07 +0400, Ibrar Ahmed wrote:
The patch does not apply; please rebase the patch.
patching file src/backend/utils/misc/guc.c
Hunk #1 FAILED at 3664.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/utils/misc/guc.c.rejpatching file src/backend/utils/misc/postgresql.conf.sample
rebased patches attached.
Thanks,
Reid
Hi Arne,
On Mon, 2022-10-24 at 15:27 +0000, Arne Roland wrote:
Hello Reid,
could you rebase the patch again? It doesn't apply currently
(http://cfbot.cputube.org/patch_40_3867.log). Thanks!
rebased patches attached.
You mention, that you want to prevent the compiler from getting
cute.I don't think this comments are exactly helpful in the current
state. I think probably fine to just omit them.
I attempted to follow previous convention when adding code and these
comments have been consistently applied throughout backend_status.c
where a volatile pointer is being used.
I don't understand the purpose of the result variable in
exceeds_max_total_bkend_mem. What purpose does it serve?I really like the simplicity of the suggestion here to prevent oom.
If max_total_backend_memory is configured, exceeds_max_total_bkend_mem()
will return true if an allocation request will push total backend memory
allocated over the configured value.
exceeds_max_total_bkend_mem() is implemented in the various allocators
along the lines of
...snip...
/* Do not exceed maximum allowed memory allocation */
if (exceeds_max_total_bkend_mem('new request size'))
return NULL;
...snip...
Do not allocate the memory requested, return NULL instead. PG already
had code in place to handle NULL returns from allocation requests.
The allocation code in aset.c, slab.c, generation.c, dsm_impl.c utilizes
exceeds_max_total_bkend_mem()
max_total_backend_memory (integer)
Specifies a limit to the amount of memory (MB) that may be allocated
to backends in total (i.e. this is not a per user or per backend limit).
If unset, or set to 0 it is disabled. A backend request that would push
the total over the limit will be denied with an out of memory error
causing that backend's current query/transaction to fail. Due to the
dynamic nature of memory allocations, this limit is not exact. If within
1.5MB of the limit and two backends request 1MB each at the same time
both may be allocated, and exceed the limit. Further requests will not
be allocated until dropping below the limit. Keep this in mind when
setting this value. This limit does not affect auxiliary backend
processes Auxiliary process . Backend memory allocations
(backend_mem_allocated) are displayed in the pg_stat_activity view.
Show quoted text
I intent to play around with a lot of backends, once I get a rebased
patch.Regards
Arne
Attachments:
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From ab654a48ec7bfbc3bc377c5757a04f1756e72e79 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 12 +++
src/backend/catalog/system_views.sql | 1 +
src/backend/storage/ipc/dsm_impl.c | 80 +++++++++++++++
src/backend/utils/activity/backend_status.c | 105 ++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 18 ++++
src/backend/utils/mmgr/generation.c | 15 +++
src/backend/utils/mmgr/slab.c | 21 ++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 7 +-
src/test/regress/expected/rules.out | 9 +-
11 files changed, 269 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e5d622d514..4983bbc814 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>backend_mem_allocated</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The byte count of memory allocated to this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d8104b090..cbf804625c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -865,6 +865,7 @@ CREATE VIEW pg_stat_activity AS
S.backend_xid,
s.backend_xmin,
S.query_id,
+ S.backend_mem_allocated,
S.query,
S.backend_type
FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index e1b90c5de4..3356bb65b5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +340,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+ }
+#else
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+ pgstat_report_backend_mem_allocated_increase(request_size);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +575,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +630,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +705,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +828,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(info.RegionSize);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +878,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_backend_mem_allocated_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1006,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_mem_allocated_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 1146a6c33c..5c33824eb8 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,8 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 backend_mem_allocated = 0;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +402,13 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /*
+ * Move sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.backend_mem_allocated = backend_mem_allocated;
+ backend_mem_allocated = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -1191,3 +1200,99 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_increase() -
+ *
+ * Called to report increase in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_increase(uint64 allocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization.
+ */
+ backend_mem_allocated += allocation;
+
+ return;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated += allocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+/* --------
+ * pgstat_report_backend_mem_allocated_decrease() -
+ *
+ * Called to report decrease in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Cases may occur where shared memory from a previous postmaster
+ * invocation still exist. These are cleaned up at startup by
+ * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
+ * zero in case no corresponding prior increase exists or decrease has
+ * already been accounted for.
+ */
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization. Do not allow
+ * backend_mem_allocated to go below zero. If pgstats has not been
+ * initialized, we are in startup and we set backend_mem_allocated to
+ * zero in cases where it would go negative and skip generating an
+ * ereport.
+ */
+ if (deallocation > backend_mem_allocated)
+ backend_mem_allocated = 0;
+ else
+ backend_mem_allocated -= deallocation;
+
+ return;
+ }
+
+ /*
+ * Do not allow backend_mem_allocated to go below zero. ereport if we
+ * would have. There's no need for a lock around the read here as it's
+ * being referenced from the same backend which means that there shouldn't
+ * be concurrent writes. We want to generate an ereport in these cases.
+ */
+ if (deallocation > beentry->backend_mem_allocated)
+ {
+ ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+
+ /*
+ * Overwrite deallocation with current backend_mem_allocated so we end
+ * up at zero.
+ */
+ deallocation = beentry->backend_mem_allocated;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_mem_allocated -= deallocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 96bffc0f2a..692ed1df18 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -553,7 +553,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -609,6 +609,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->backend_mem_allocated);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index db402e3a41..f6d1333f3d 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
AssertArg(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,8 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_mem_allocated_decrease(deallocation +
+ context->mem_allocated);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +747,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +958,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1043,6 +1058,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_backend_mem_allocated_decrease(block->endptr - ((char *) block));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1173,7 +1189,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_backend_mem_allocated_decrease(oldblksize);
set->header.mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 4cb75f493f..7bb9175f6d 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_mem_allocated_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
AssertArg(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(context->mem_allocated);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -726,6 +739,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_backend_mem_allocated_decrease(block->blksize);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 1a0b28f9ea..efdc2736c1 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -238,6 +239,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_backend_mem_allocated_increase(headerSize);
+
return (MemoryContext) slab;
}
@@ -253,6 +260,7 @@ SlabReset(MemoryContext context)
{
SlabContext *slab = (SlabContext *) context;
int i;
+ uint64 deallocation = 0;
AssertArg(SlabIsValid(slab));
@@ -278,9 +286,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_backend_mem_allocated_decrease(deallocation);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -294,8 +304,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated, cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_backend_mem_allocated_decrease(slab->headerSize);
+
/* And free the context header */
free(context);
}
@@ -364,6 +383,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_backend_mem_allocated_increase(slab->blockSize);
}
/* grab the block from the freelist (even the new block is there) */
@@ -537,6 +557,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_backend_mem_allocated_decrease(slab->blockSize);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62a5b8e655..737a7c5034 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5347,9 +5347,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,backend_mem_allocated}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index b582b46e9f..d59a1d50f6 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -169,6 +169,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 backend_mem_allocated;
} PgBackendStatus;
@@ -313,7 +316,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_report_backend_mem_allocated_increase(uint64 allocation);
+extern void pgstat_report_backend_mem_allocated_decrease(uint64 deallocation);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
/* ----------
* Support functions for the SQL-callable functions to
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index bfcd8ac9a0..36947c636d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1752,9 +1752,10 @@ pg_stat_activity| SELECT s.datid,
s.backend_xid,
s.backend_xmin,
s.query_id,
+ s.backend_mem_allocated,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1869,7 +1870,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2050,7 +2051,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2084,7 +2085,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_mem_allocated)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
--
2.25.1
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 7c6735a34d22fde1a5103bd962d0a45f1322e87d Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 ++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 111 +++++++++++++++++-
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 11 +-
src/backend/utils/mmgr/slab.c | 8 ++
src/include/utils/backend_status.h | 2 +
9 files changed, 199 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 559eb898a9..4d22491a7e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>backend_mem_allocated</varname>) are displayed in
+ the <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 3356bb65b5..cc061056a3 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -253,6 +253,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -524,6 +528,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -718,6 +726,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 5c33824eb8..872fe66188 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,9 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
+
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -1278,7 +1281,7 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
*/
if (deallocation > beentry->backend_mem_allocated)
{
- ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+ ereport(LOG, errmsg("deallocation would decrease backend memory below zero; setting reported to 0"));
/*
* Overwrite deallocation with current backend_mem_allocated so we end
@@ -1296,3 +1299,109 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
beentry->backend_mem_allocated -= deallocation;
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
+
+/* ----------
+ * pgstat_get_all_backend_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_backend_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try
+ * to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 backend_mem_allocated = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /*
+ * Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ backend_mem_allocated = vbeentry->backend_mem_allocated;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_backend_memory_allocated += backend_mem_allocated;
+
+ beentry++;
+ }
+
+ return all_backend_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_backend_memory limit (%llu > %llu)",
+ (unsigned long long) pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934..745cc2ca9c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3405,6 +3405,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..1ce0dee6d0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index f6d1333f3d..5c659dc0fc 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -742,6 +746,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -939,6 +948,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1179,6 +1192,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 7bb9175f6d..ef2e11aefc 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -285,7 +288,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
- uint64 deallocation = 0;
+ uint64 deallocation = 0;
AssertArg(GenerationIsValid(set));
@@ -380,6 +383,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +489,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index efdc2736c1..6ec512defd 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -197,6 +197,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -351,6 +355,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index d59a1d50f6..0403a24eef 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -278,6 +278,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -329,6 +330,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
#endif /* BACKEND_STATUS_H */
--
2.25.1
On Tue, 2022-10-25 at 11:49 -0400, Reid Thompson wrote:
Hi Arne,
On Mon, 2022-10-24 at 15:27 +0000, Arne Roland wrote:
Hello Reid,
could you rebase the patch again? It doesn't apply currently
(http://cfbot.cputube.org/patch_40_3867.log). Thanks!rebased patches attached.
Rebased to current. Add a couple changes per conversation with D
Christensen (include units in field name, group field with backend_xid
and backend_xmin fields in pg_stat_activity view, rather than between
query_id and query)
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 9cf35c79be107feedb63f6f674ac9d2347d1875e Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 ++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 111 +++++++++++++++++-
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 9 ++
src/backend/utils/mmgr/slab.c | 8 ++
src/include/utils/backend_status.h | 2 +
9 files changed, 198 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 559eb898a9..5762999fa5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>backend_allocated_bytes</varname>) are displayed in
+ the <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index eae03159c3..aaf74e9486 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -525,6 +529,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -719,6 +727,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 30a89e899a..5500ed4f37 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,9 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
+
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -1278,7 +1281,7 @@ pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation)
*/
if (deallocation > beentry->backend_allocated_bytes)
{
- ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+ ereport(LOG, errmsg("deallocation would decrease backend memory below zero; setting reported to 0"));
/*
* Overwrite deallocation with current backend_allocated_bytes so we
@@ -1296,3 +1299,109 @@ pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation)
beentry->backend_allocated_bytes -= deallocation;
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
+
+/* ----------
+ * pgstat_get_all_backend_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_backend_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try
+ * to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 backend_allocated_bytes = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /*
+ * Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ backend_allocated_bytes = vbeentry->backend_allocated_bytes;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_backend_memory_allocated += backend_allocated_bytes;
+
+ beentry++;
+ }
+
+ return all_backend_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_backend_memory limit (%llu > %llu)",
+ (unsigned long long) pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 836b49484a..0e09766949 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3403,6 +3403,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..1ce0dee6d0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 81e62e4981..cc865a89e0 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -742,6 +746,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -939,6 +948,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1179,6 +1192,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index df3007edfb..495665d3b0 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -380,6 +383,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +489,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 532c038973..5b98176654 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -197,6 +197,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -351,6 +355,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 75d87e8308..c8beb116b8 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -278,6 +278,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -329,6 +330,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
#endif /* BACKEND_STATUS_H */
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From 8f729c59c3aa1a02d008795159a748e1592a9916 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 12 +++
src/backend/catalog/system_views.sql | 1 +
src/backend/storage/ipc/dsm_impl.c | 81 +++++++++++++++
src/backend/utils/activity/backend_status.c | 105 ++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 18 ++++
src/backend/utils/mmgr/generation.c | 15 +++
src/backend/utils/mmgr/slab.c | 21 ++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 7 +-
src/test/regress/expected/rules.out | 9 +-
11 files changed, 270 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e5d622d514..972805b85a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>backend_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The byte count of memory allocated to this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d8104b090..84d462aa97 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -864,6 +864,7 @@ CREATE VIEW pg_stat_activity AS
S.state,
S.backend_xid,
s.backend_xmin,
+ S.backend_allocated_bytes,
S.query_id,
S.query,
S.backend_type
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 6ddd46a4e7..eae03159c3 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_allocated_bytes_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_backend_allocated_bytes_decrease(*mapped_size);
+ pgstat_report_backend_allocated_bytes_increase(request_size);
+ }
+#else
+ pgstat_report_backend_allocated_bytes_decrease(*mapped_size);
+ pgstat_report_backend_allocated_bytes_increase(request_size);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +576,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_allocated_bytes_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +631,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_allocated_bytes_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +706,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_backend_allocated_bytes_decrease(*mapped_size);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +829,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_allocated_bytes_increase(info.RegionSize);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +879,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_backend_allocated_bytes_decrease(*mapped_size);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1007,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_backend_allocated_bytes_increase(request_size);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 1146a6c33c..30a89e899a 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,8 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 backend_allocated_bytes = 0;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +402,13 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /*
+ * Move sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.backend_allocated_bytes = backend_allocated_bytes;
+ backend_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -1191,3 +1200,99 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/* --------
+ * pgstat_report_backend_allocated_bytes_increase() -
+ *
+ * Called to report increase in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_allocated_bytes_increase(uint64 allocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization.
+ */
+ backend_allocated_bytes += allocation;
+
+ return;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_allocated_bytes += allocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+/* --------
+ * pgstat_report_backend_allocated_bytes_decrease() -
+ *
+ * Called to report decrease in memory allocated for this backend
+ * --------
+ */
+void
+pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ /*
+ * Cases may occur where shared memory from a previous postmaster
+ * invocation still exist. These are cleaned up at startup by
+ * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to
+ * zero in case no corresponding prior increase exists or decrease has
+ * already been accounted for.
+ */
+
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization. Do not allow
+ * backend_allocated_bytes to go below zero. If pgstats has not been
+ * initialized, we are in startup and we set backend_allocated_bytes
+ * to zero in cases where it would go negative and skip generating an
+ * ereport.
+ */
+ if (deallocation > backend_allocated_bytes)
+ backend_allocated_bytes = 0;
+ else
+ backend_allocated_bytes -= deallocation;
+
+ return;
+ }
+
+ /*
+ * Do not allow backend_allocated_bytes to go below zero. ereport if we
+ * would have. There's no need for a lock around the read here as it's
+ * being referenced from the same backend which means that there shouldn't
+ * be concurrent writes. We want to generate an ereport in these cases.
+ */
+ if (deallocation > beentry->backend_allocated_bytes)
+ {
+ ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0"));
+
+ /*
+ * Overwrite deallocation with current backend_allocated_bytes so we
+ * end up at zero.
+ */
+ deallocation = beentry->backend_allocated_bytes;
+ }
+
+ /*
+ * Update my status entry, following the protocol of bumping
+ * st_changecount before and after. We use a volatile pointer here to
+ * ensure the compiler doesn't try to get cute.
+ */
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+ beentry->backend_allocated_bytes -= deallocation;
+ PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 96bffc0f2a..b6d135ad2f 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -553,7 +553,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -609,6 +609,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->backend_allocated_bytes);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6a8bbcd59..81e62e4981 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_allocated_bytes_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_allocated_bytes_decrease(deallocation);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_backend_allocated_bytes_decrease(deallocation);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,8 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_backend_allocated_bytes_decrease(deallocation +
+ context->mem_allocated);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +747,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_allocated_bytes_increase(blksize);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +958,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_allocated_bytes_increase(blksize);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1043,6 +1058,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_backend_allocated_bytes_decrease(block->endptr - ((char *) block));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1173,7 +1189,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_backend_allocated_bytes_decrease(oldblksize);
set->header.mem_allocated += blksize;
+ pgstat_report_backend_allocated_bytes_increase(blksize);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b432a92be3..df3007edfb 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_backend_allocated_bytes_increase(firstBlockSize);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_backend_allocated_bytes_decrease(deallocation);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_backend_allocated_bytes_decrease(context->mem_allocated);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_allocated_bytes_increase(blksize);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_backend_allocated_bytes_increase(blksize);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -726,6 +739,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_backend_allocated_bytes_decrease(block->blksize);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 6df0839b6a..532c038973 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -238,6 +239,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_backend_allocated_bytes_increase(headerSize);
+
return (MemoryContext) slab;
}
@@ -253,6 +260,7 @@ SlabReset(MemoryContext context)
{
SlabContext *slab = (SlabContext *) context;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -278,9 +286,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_backend_allocated_bytes_decrease(deallocation);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -294,8 +304,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated, cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_backend_allocated_bytes_decrease(slab->headerSize);
+
/* And free the context header */
free(context);
}
@@ -364,6 +383,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_backend_allocated_bytes_increase(slab->blockSize);
}
/* grab the block from the freelist (even the new block is there) */
@@ -537,6 +557,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_backend_allocated_bytes_decrease(slab->blockSize);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 20f5aa56ea..1c37f7db5d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5347,9 +5347,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,backend_allocated_bytes}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index b582b46e9f..75d87e8308 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -169,6 +169,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 backend_allocated_bytes;
} PgBackendStatus;
@@ -313,7 +316,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_report_backend_allocated_bytes_increase(uint64 allocation);
+extern void pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
/* ----------
* Support functions for the SQL-callable functions to
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 624d0e5aae..ba9f494806 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1753,10 +1753,11 @@ pg_stat_activity| SELECT s.datid,
s.state,
s.backend_xid,
s.backend_xmin,
+ s.backend_allocated_bytes,
s.query_id,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1871,7 +1872,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2052,7 +2053,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_allocated_bytes)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2086,7 +2087,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, backend_allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
--
2.25.1
On Thu, 2022-11-03 at 11:48 -0400, Reid Thompson wrote:
On Tue, 2022-10-25 at 11:49 -0400, Reid Thompson wrote:
Rebased to current. Add a couple changes per conversation with D
Christensen (include units in field name, group field with
backend_xid
and backend_xmin fields in pg_stat_activity view, rather than between
query_id and query)
rebased/patched to current master && current pg-stat-activity-backend-memory-allocated
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 1470f45e086bef0757cc262d10e08904e46b9a88 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 +++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 108 ++++++++++++++++++
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 9 ++
src/backend/utils/mmgr/slab.c | 8 ++
src/include/utils/backend_status.h | 3 +
9 files changed, 197 insertions(+)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24b1624bad..c2db3ace7a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>allocated_bytes</varname>) are displayed in the
+ <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 65d59fc43e..8d9df676af 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -525,6 +529,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -719,6 +727,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 3785e8af53..07dfd8f490 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,9 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
+
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -1236,3 +1239,108 @@ pgstat_reset_allocated_bytes_storage(void)
my_allocated_bytes = &local_my_allocated_bytes;
}
+/* ----------
+ * pgstat_get_all_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try
+ * to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 allocated_bytes = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /*
+ * Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ allocated_bytes = vbeentry->allocated_bytes;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_memory_allocated += allocated_bytes;
+
+ beentry++;
+ }
+
+ return all_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_memory limit (%llu > %llu)",
+ (unsigned long long) pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 349dd6a537..c20a656310 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3423,6 +3423,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..1ce0dee6d0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b202e115b6..596f1db408 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -741,6 +745,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -938,6 +947,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1178,6 +1191,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 459eb985d6..145409cf21 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -380,6 +383,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +489,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index f38256f6f3..9304e13638 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -197,6 +197,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -351,6 +355,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index e5aa90b101..94528aa650 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -286,6 +286,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -325,6 +326,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
extern void pgstat_reset_allocated_bytes_storage(void);
@@ -337,6 +339,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
* pgstat_report_allocated_bytes() -
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From 3d772d8620faba4bd4e091d6618c63557fbf6749 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 15 ++++
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/autovacuum.c | 6 ++
src/backend/postmaster/postmaster.c | 13 ++++
src/backend/postmaster/syslogger.c | 3 +
src/backend/storage/ipc/dsm_impl.c | 81 +++++++++++++++++++++
src/backend/utils/activity/backend_status.c | 45 ++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 17 +++++
src/backend/utils/mmgr/generation.c | 15 ++++
src/backend/utils/mmgr/slab.c | 21 ++++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 59 ++++++++++++++-
src/test/regress/expected/rules.out | 9 ++-
src/test/regress/expected/stats.out | 11 +++
src/test/regress/sql/stats.sql | 3 +
16 files changed, 300 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5579b8b9e0..ffe7d2566c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,21 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use <function>pg_size_pretty</function>
+ described in <xref linkend="functions-admin-dbsize"/> to make this value
+ more easily readable.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d8104b090..9ea8f78c95 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -864,6 +864,7 @@ CREATE VIEW pg_stat_activity AS
S.state,
S.backend_xid,
s.backend_xmin,
+ S.allocated_bytes,
S.query_id,
S.query,
S.backend_type
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 601834d4b4..f54606104d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -407,6 +407,9 @@ StartAutoVacLauncher(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..89a6caec78 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4102,6 +4102,9 @@ BackendStartup(Port *port)
{
free(bn);
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* Detangle from postmaster */
InitPostmasterChild();
@@ -5307,6 +5310,11 @@ StartChildProcess(AuxProcType type)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
AuxiliaryProcessMain(type); /* does not return */
}
#endif /* EXEC_BACKEND */
@@ -5700,6 +5708,11 @@ do_start_bgworker(RegisteredBgWorker *rw)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
StartBackgroundWorker();
exit(1); /* should not get here */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index d6d02e3c63..4cbc59cda5 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -679,6 +679,9 @@ SysLogger_Start(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 6ddd46a4e7..65d59fc43e 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_allocated_bytes(*mapped_size, DECREASE);
+ pgstat_report_allocated_bytes(request_size, INCREASE);
+ }
+#else
+ pgstat_report_allocated_bytes(*mapped_size, DECREASE);
+ pgstat_report_allocated_bytes(request_size, INCREASE);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +576,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +631,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, INCREASE);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +706,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, DECREASE);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +829,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(info.RegionSize, INCREASE);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +879,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes(*mapped_size, DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1007,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, INCREASE);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 1146a6c33c..3785e8af53 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,9 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +403,15 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local_my_allocated_bytes to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes);
+
+ /* Populate sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -459,6 +471,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /*
+ * Stop reporting memory allocation changes to &MyBEEntry->allocated_bytes
+ */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1191,3 +1208,31 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * *allocated_bytes. *allocated_bytes needs to be valid until
+ * pgstat_set_allocated_bytes_storage() is called.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * my_allocated_bytes into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *new_allocated_bytes)
+{
+ my_allocated_bytes = new_allocated_bytes;
+ *new_allocated_bytes = local_my_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the location set up
+ * by pgstat_set_allocated_bytes_storage() becomes invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ my_allocated_bytes = &local_my_allocated_bytes;
+}
+
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ae3365d917..170d60df8b 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -559,7 +559,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -615,6 +615,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->allocated_bytes);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6a8bbcd59..b202e115b6 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, INCREASE);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation, DECREASE);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes(deallocation, DECREASE);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation + context->mem_allocated, DECREASE);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, INCREASE);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, INCREASE);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1043,6 +1057,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes(block->endptr - ((char *) block), DECREASE);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1173,7 +1188,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes(oldblksize, DECREASE);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, INCREASE);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b432a92be3..459eb985d6 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, INCREASE);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes(deallocation, DECREASE);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes(context->mem_allocated, DECREASE);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, INCREASE);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, INCREASE);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -726,6 +739,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes(block->blksize, DECREASE);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 6df0839b6a..f38256f6f3 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -238,6 +239,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes(headerSize, INCREASE);
+
return (MemoryContext) slab;
}
@@ -253,6 +260,7 @@ SlabReset(MemoryContext context)
{
SlabContext *slab = (SlabContext *) context;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -278,9 +286,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes(deallocation, DECREASE);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -294,8 +304,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated, cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_allocated_bytes(slab->headerSize, DECREASE);
+
/* And free the context header */
free(context);
}
@@ -364,6 +383,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, INCREASE);
}
/* grab the block from the freelist (even the new block is there) */
@@ -537,6 +557,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, DECREASE);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9301b2627..1bf02758d4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5373,9 +5373,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,allocated_bytes}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index b582b46e9f..e5aa90b101 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
#include "utils/backend_progress.h"
+#include "common/int.h"
/* ----------
@@ -32,6 +33,13 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocated bytes */
+enum allocation_direction
+{
+ DECREASE = -1,
+ IGNORE,
+ INCREASE,
+};
/* ----------
* Shared-memory data structures
@@ -169,6 +177,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
} PgBackendStatus;
@@ -282,6 +293,7 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
/* ----------
@@ -313,7 +325,8 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -325,5 +338,49 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes() -
+ *
+ * Called to report change in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized. allocation_direction is a
+ * positive/negative multiplier enum defined above.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes(uint64 allocated_bytes, int allocation_direction)
+{
+ uint64 temp;
+
+ /* Avoid *my_allocated_bytes unsigned integer overflow on DECREASE */
+ if (allocation_direction == DECREASE &&
+ pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp))
+ {
+ *my_allocated_bytes = 0;
+ ereport(LOG,
+ errmsg("Backend %d deallocated %ld bytes, exceeding the %ld bytes it is currently reporting allocated. Setting reported to 0.",
+ MyProcPid, allocated_bytes, *my_allocated_bytes));
+ }
+ else
+ *my_allocated_bytes += (allocated_bytes) * allocation_direction;
+
+ return;
+}
+
+/* ---------
+ * pgstat_zero_my_allocated_bytes() -
+ *
+ * Called to zero out local allocated bytes variable after fork to avoid double
+ * counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_zero_my_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 37c1c86473..5263294ad4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1756,10 +1756,11 @@ pg_stat_activity| SELECT s.datid,
s.state,
s.backend_xid,
s.backend_xmin,
+ s.allocated_bytes,
s.query_id,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1874,7 +1875,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2055,7 +2056,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2089,7 +2090,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 1d84407a03..ab7e95c367 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1126,4 +1126,15 @@ SELECT pg_stat_get_subscription_stats(NULL);
(1 row)
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index b4d6753c71..2f0b1cc9d8 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -535,5 +535,8 @@ SET enable_seqscan TO on;
SELECT pg_stat_get_replication_slot(NULL);
SELECT pg_stat_get_subscription_stats(NULL);
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
-- End of Stats Test
--
2.25.1
Hi,
On 2022-11-26 22:22:15 -0500, Reid Thompson wrote:
rebased/patched to current master && current pg-stat-activity-backend-memory-allocated
This version fails to build with msvc, and builds with warnings on other
platforms.
https://cirrus-ci.com/build/5410696721072128
msvc:
[20:26:51.286] c:\cirrus\src\include\utils/backend_status.h(40): error C2059: syntax error: 'constant'
mingw cross:
[20:26:26.358] from /usr/share/mingw-w64/include/winsock2.h:23,
[20:26:26.358] from ../../src/include/port/win32_port.h:60,
[20:26:26.358] from ../../src/include/port.h:24,
[20:26:26.358] from ../../src/include/c.h:1306,
[20:26:26.358] from ../../src/include/postgres.h:47,
[20:26:26.358] from controldata_utils.c:18:
[20:26:26.358] ../../src/include/utils/backend_status.h:40:2: error: expected identifier before numeric constant
[20:26:26.358] 40 | IGNORE,
[20:26:26.358] | ^~~~~~
[20:26:26.358] In file included from ../../src/include/postgres.h:48,
[20:26:26.358] from controldata_utils.c:18:
[20:26:26.358] ../../src/include/utils/backend_status.h: In function ‘pgstat_report_allocated_bytes’:
[20:26:26.358] ../../src/include/utils/backend_status.h:365:12: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘uint64’ {aka ‘long long unsigned int’} [-Werror=format=]
[20:26:26.358] 365 | errmsg("Backend %d deallocated %ld bytes, exceeding the %ld bytes it is currently reporting allocated. Setting reported to 0.",
[20:26:26.358] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[20:26:26.358] 366 | MyProcPid, allocated_bytes, *my_allocated_bytes));
[20:26:26.358] | ~~~~~~~~~~~~~~~
[20:26:26.358] | |
[20:26:26.358] | uint64 {aka long long unsigned int}
Due to windows having long be 32bit, you need to use %lld. Our custom to deal
with that is to cast the argument to errmsg as long long unsigned and use
%llu.
Btw, given that the argument is uint64, it doesn't seem correct to use %ld,
that's signed. Not that it's going to matter, but ...
Greetings,
Andres Freund
On Tue, 2022-12-06 at 10:32 -0800, Andres Freund wrote:
Hi,
On 2022-11-26 22:22:15 -0500, Reid Thompson wrote:
rebased/patched to current master && current pg-stat-activity-
backend-memory-allocatedThis version fails to build with msvc, and builds with warnings on
other
platforms.
https://cirrus-ci.com/build/5410696721072128
msvc:Andres Freund
updated patches
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From e48292eaf402bfe397f1c2fdc1b3efd8cd0a9137 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 +++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 108 ++++++++++++++++++
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 9 ++
src/backend/utils/mmgr/slab.c | 8 ++
src/include/utils/backend_status.h | 3 +
9 files changed, 197 insertions(+)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ff6fcd902a..2899f109ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>allocated_bytes</varname>) are displayed in the
+ <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 36ef3e425e..58fb690c69 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -525,6 +529,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -719,6 +727,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 3785e8af53..07dfd8f490 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,9 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
+
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -1236,3 +1239,108 @@ pgstat_reset_allocated_bytes_storage(void)
my_allocated_bytes = &local_my_allocated_bytes;
}
+/* ----------
+ * pgstat_get_all_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try
+ * to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 allocated_bytes = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /*
+ * Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ allocated_bytes = vbeentry->allocated_bytes;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_memory_allocated += allocated_bytes;
+
+ beentry++;
+ }
+
+ return all_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_memory limit (%llu > %llu)",
+ (unsigned long long) pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1bf14eec66..bfde338a8d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3423,6 +3423,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 043864597f..75ea0c9af1 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index cc10dfd609..3ce2191555 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -741,6 +745,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -938,6 +947,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1178,6 +1191,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b0460d97c2..3183c9a8dc 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -380,6 +383,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +489,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 08bb013f7c..2074f3853a 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -197,6 +197,10 @@ SlabContextCreate(MemoryContext parent,
headerSize += chunksPerBlock * sizeof(bool);
#endif
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(headerSize))
+ return NULL;
+
slab = (SlabContext *) malloc(headerSize);
if (slab == NULL)
{
@@ -351,6 +355,10 @@ SlabAlloc(MemoryContext context, Size size)
*/
if (slab->minFreeChunks == 0)
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (block == NULL)
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 3ba479cb0d..1787d75ace 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -286,6 +286,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -325,6 +326,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
extern void pgstat_reset_allocated_bytes_storage(void);
@@ -337,6 +339,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
* pgstat_report_allocated_bytes() -
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From fdb7e6d5bb653e9c5031fd058bf168bdf80a20eb Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 15 ++++
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/autovacuum.c | 6 ++
src/backend/postmaster/postmaster.c | 13 ++++
src/backend/postmaster/syslogger.c | 3 +
src/backend/storage/ipc/dsm_impl.c | 81 +++++++++++++++++++++
src/backend/utils/activity/backend_status.c | 45 ++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 17 +++++
src/backend/utils/mmgr/generation.c | 15 ++++
src/backend/utils/mmgr/slab.c | 21 ++++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 59 ++++++++++++++-
src/test/regress/expected/rules.out | 9 ++-
src/test/regress/expected/stats.out | 11 +++
src/test/regress/sql/stats.sql | 3 +
16 files changed, 300 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 11a8ebe5ec..13ecbe5877 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -948,6 +948,21 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use <function>pg_size_pretty</function>
+ described in <xref linkend="functions-admin-dbsize"/> to make this value
+ more easily readable.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d8104b090..9ea8f78c95 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -864,6 +864,7 @@ CREATE VIEW pg_stat_activity AS
S.state,
S.backend_xid,
s.backend_xmin,
+ S.allocated_bytes,
S.query_id,
S.query,
S.backend_type
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 0746d80224..f6b6f71cdb 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -407,6 +407,9 @@ StartAutoVacLauncher(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a8a246921f..89a6caec78 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4102,6 +4102,9 @@ BackendStartup(Port *port)
{
free(bn);
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* Detangle from postmaster */
InitPostmasterChild();
@@ -5307,6 +5310,11 @@ StartChildProcess(AuxProcType type)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
AuxiliaryProcessMain(type); /* does not return */
}
#endif /* EXEC_BACKEND */
@@ -5700,6 +5708,11 @@ do_start_bgworker(RegisteredBgWorker *rw)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
StartBackgroundWorker();
exit(1); /* should not get here */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index d6d02e3c63..4cbc59cda5 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -679,6 +679,9 @@ SysLogger_Start(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 6ddd46a4e7..36ef3e425e 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_allocated_bytes(request_size - *mapped_size,
+ PG_ALLOC_INCREASE);
+ }
+#else
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +576,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +631,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +706,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +829,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(info.RegionSize, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +879,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1007,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 1146a6c33c..3785e8af53 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,9 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +403,15 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local_my_allocated_bytes to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes);
+
+ /* Populate sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -459,6 +471,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /*
+ * Stop reporting memory allocation changes to &MyBEEntry->allocated_bytes
+ */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1191,3 +1208,31 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * *allocated_bytes. *allocated_bytes needs to be valid until
+ * pgstat_set_allocated_bytes_storage() is called.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * my_allocated_bytes into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *new_allocated_bytes)
+{
+ my_allocated_bytes = new_allocated_bytes;
+ *new_allocated_bytes = local_my_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the location set up
+ * by pgstat_set_allocated_bytes_storage() becomes invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ my_allocated_bytes = &local_my_allocated_bytes;
+}
+
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 25a159b5e5..c4350cfa50 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -302,7 +302,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -358,6 +358,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->allocated_bytes);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6a8bbcd59..cc10dfd609 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation + context->mem_allocated, PG_ALLOC_DECREASE);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1043,6 +1057,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes(block->endptr - ((char *) block), PG_ALLOC_DECREASE);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1173,7 +1188,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes(oldblksize, PG_ALLOC_DECREASE);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b432a92be3..b0460d97c2 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes(context->mem_allocated, PG_ALLOC_DECREASE);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -726,6 +739,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes(block->blksize, PG_ALLOC_DECREASE);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 6df0839b6a..08bb013f7c 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -53,6 +53,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -238,6 +239,12 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add headerSize to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes(headerSize, PG_ALLOC_INCREASE);
+
return (MemoryContext) slab;
}
@@ -253,6 +260,7 @@ SlabReset(MemoryContext context)
{
SlabContext *slab = (SlabContext *) context;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -278,9 +286,11 @@ SlabReset(MemoryContext context)
free(block);
slab->nblocks--;
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
slab->minFreeChunks = 0;
Assert(slab->nblocks == 0);
@@ -294,8 +304,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+ /*
+ * Until header allocation is included in context->mem_allocated, cast to
+ * slab and decrement the headerSize
+ */
+ SlabContext *slab = castNode(SlabContext, context);
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ pgstat_report_allocated_bytes(slab->headerSize, PG_ALLOC_DECREASE);
+
/* And free the context header */
free(context);
}
@@ -364,6 +383,7 @@ SlabAlloc(MemoryContext context, Size size)
slab->minFreeChunks = slab->chunksPerBlock;
slab->nblocks += 1;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_INCREASE);
}
/* grab the block from the freelist (even the new block is there) */
@@ -537,6 +557,7 @@ SlabFree(void *pointer)
free(block);
slab->nblocks--;
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_DECREASE);
}
else
dlist_push_head(&slab->freelist[block->nfree], &block->node);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9301b2627..1bf02758d4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5373,9 +5373,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,allocated_bytes}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index b582b46e9f..3ba479cb0d 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
#include "utils/backend_progress.h"
+#include "common/int.h"
/* ----------
@@ -32,6 +33,13 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocated bytes */
+enum allocation_direction
+{
+ PG_ALLOC_DECREASE = -1,
+ PG_ALLOC_IGNORE,
+ PG_ALLOC_INCREASE,
+};
/* ----------
* Shared-memory data structures
@@ -169,6 +177,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
} PgBackendStatus;
@@ -282,6 +293,7 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
/* ----------
@@ -313,7 +325,8 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -325,5 +338,49 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes() -
+ *
+ * Called to report change in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized. allocation_direction is a
+ * positive/negative multiplier enum defined above.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction)
+{
+ uint64 temp;
+
+ /* Avoid *my_allocated_bytes unsigned integer overflow on PG_ALLOC_DECREASE */
+ if (allocation_direction == PG_ALLOC_DECREASE &&
+ pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp))
+ {
+ *my_allocated_bytes = 0;
+ ereport(LOG,
+ errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated. Setting reported to 0.",
+ MyProcPid, (long long)allocated_bytes, (unsigned long long)*my_allocated_bytes));
+ }
+ else
+ *my_allocated_bytes += (allocated_bytes) * allocation_direction;
+
+ return;
+}
+
+/* ---------
+ * pgstat_zero_my_allocated_bytes() -
+ *
+ * Called to zero out local allocated bytes variable after fork to avoid double
+ * counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_zero_my_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index fb9f936d43..5f854aab18 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1756,10 +1756,11 @@ pg_stat_activity| SELECT s.datid,
s.state,
s.backend_xid,
s.backend_xmin,
+ s.allocated_bytes,
s.query_id,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1874,7 +1875,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2055,7 +2056,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2089,7 +2090,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 1d84407a03..ab7e95c367 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1126,4 +1126,15 @@ SELECT pg_stat_get_subscription_stats(NULL);
(1 row)
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index b4d6753c71..2f0b1cc9d8 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -535,5 +535,8 @@ SET enable_seqscan TO on;
SELECT pg_stat_get_replication_slot(NULL);
SELECT pg_stat_get_subscription_stats(NULL);
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
-- End of Stats Test
--
2.25.1
On Fri, 9 Dec 2022 at 20:41, Reid Thompson
<reid.thompson@crunchydata.com> wrote:
On Tue, 2022-12-06 at 10:32 -0800, Andres Freund wrote:
Hi,
On 2022-11-26 22:22:15 -0500, Reid Thompson wrote:
rebased/patched to current master && current pg-stat-activity-
backend-memory-allocatedThis version fails to build with msvc, and builds with warnings on
other
platforms.
https://cirrus-ci.com/build/5410696721072128
msvc:Andres Freund
updated patches
The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3867.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
92957ed98c5c565362ce665266132a7f08f6b0c0 ===
=== applying patch
./0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch
...
patching file src/backend/utils/mmgr/slab.c
Hunk #1 succeeded at 69 (offset 16 lines).
Hunk #2 succeeded at 414 (offset 175 lines).
Hunk #3 succeeded at 436 with fuzz 2 (offset 176 lines).
Hunk #4 FAILED at 286.
Hunk #5 succeeded at 488 (offset 186 lines).
Hunk #6 FAILED at 381.
Hunk #7 FAILED at 554.
3 out of 7 hunks FAILED -- saving rejects to file
src/backend/utils/mmgr/slab.c.rej
[1]: http://cfbot.cputube.org/patch_41_3867.log
Regards,
Vignesh
On Tue, 2023-01-03 at 16:22 +0530, vignesh C wrote:
....
The patch does not apply on top of HEAD as in [1], please post a
rebased patch:
...
Regards,
Vignesh
Attached is rebased patch, with some updates related to committed changes.
Thanks,
Reid
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 69516942b71d5d41850fbc00b971db7476c7a01a Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 +++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 108 ++++++++++++++++++
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 9 ++
src/backend/utils/mmgr/slab.c | 9 +-
src/include/utils/backend_status.h | 3 +
9 files changed, 197 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 05b3862d09..0362f26451 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>allocated_bytes</varname>) are displayed in the
+ <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 22885c7bd2..f7047107d5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -525,6 +529,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -719,6 +727,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 7baf2db57d..da2b5fb042 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,9 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
+
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -1239,3 +1242,108 @@ pgstat_reset_allocated_bytes_storage(void)
my_allocated_bytes = &local_my_allocated_bytes;
}
+/* ----------
+ * pgstat_get_all_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try
+ * to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 allocated_bytes = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /*
+ * Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ allocated_bytes = vbeentry->allocated_bytes;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_memory_allocated += allocated_bytes;
+
+ beentry++;
+ }
+
+ return all_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_memory limit (%llu > %llu)",
+ (unsigned long long) pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 68328b1402..5e58d6c1bc 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3444,6 +3444,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5afdeb04de..1f1eaac5f4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 94e5f4b5be..6ae78acc41 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -741,6 +745,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -938,6 +947,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1178,6 +1191,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 97007ca0ee..4d8b250cb4 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -380,6 +383,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +489,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index e314f8f343..adc88e0047 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -356,9 +356,12 @@ SlabContextCreate(MemoryContext parent,
elog(ERROR, "block size %zu for slab is too small for %zu-byte chunks",
blockSize, chunkSize);
-
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock)))
+ return NULL;
slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
+
if (slab == NULL)
{
MemoryContextStats(TopMemoryContext);
@@ -559,6 +562,10 @@ SlabAlloc(MemoryContext context, Size size)
}
else
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (unlikely(block == NULL))
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 754ff0dc62..33269eb11b 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -297,6 +297,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -336,6 +337,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
extern void pgstat_reset_allocated_bytes_storage(void);
@@ -348,6 +350,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
* pgstat_report_allocated_bytes() -
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From 0a6b152e0559a250dddd33bd7d43eb0959432e0d Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 15 ++++
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/autovacuum.c | 6 ++
src/backend/postmaster/postmaster.c | 13 ++++
src/backend/postmaster/syslogger.c | 3 +
src/backend/storage/ipc/dsm_impl.c | 81 +++++++++++++++++++++
src/backend/utils/activity/backend_status.c | 45 ++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 17 +++++
src/backend/utils/mmgr/generation.c | 15 ++++
src/backend/utils/mmgr/slab.c | 22 ++++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 63 +++++++++++++++-
src/test/regress/expected/rules.out | 9 ++-
src/test/regress/expected/stats.out | 11 +++
src/test/regress/sql/stats.sql | 3 +
16 files changed, 305 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5bcba0fdec..63d2357d71 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -955,6 +955,21 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use <function>pg_size_pretty</function>
+ described in <xref linkend="functions-admin-dbsize"/> to make this value
+ more easily readable.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 447c9b970f..9ba7073fa1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -864,6 +864,7 @@ CREATE VIEW pg_stat_activity AS
S.state,
S.backend_xid,
s.backend_xmin,
+ S.allocated_bytes,
S.query_id,
S.query,
S.backend_type
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index e40bd39b3f..a7953c548d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -407,6 +407,9 @@ StartAutoVacLauncher(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eac3450774..24278e5c18 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4102,6 +4102,9 @@ BackendStartup(Port *port)
{
free(bn);
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* Detangle from postmaster */
InitPostmasterChild();
@@ -5307,6 +5310,11 @@ StartChildProcess(AuxProcType type)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
AuxiliaryProcessMain(type); /* does not return */
}
#endif /* EXEC_BACKEND */
@@ -5700,6 +5708,11 @@ do_start_bgworker(RegisteredBgWorker *rw)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
StartBackgroundWorker();
exit(1); /* should not get here */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index a876d02c6f..0d51af6fd8 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -679,6 +679,9 @@ SysLogger_Start(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index f0965c3481..22885c7bd2 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_allocated_bytes(request_size - *mapped_size,
+ PG_ALLOC_INCREASE);
+ }
+#else
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +576,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +631,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +706,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +829,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(info.RegionSize, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +879,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1007,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 608d01ea0d..7baf2db57d 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,9 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +403,15 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local_my_allocated_bytes to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes);
+
+ /* Populate sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -459,6 +471,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /*
+ * Stop reporting memory allocation changes to &MyBEEntry->allocated_bytes
+ */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1194,3 +1211,31 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * *allocated_bytes. *allocated_bytes needs to be valid until
+ * pgstat_set_allocated_bytes_storage() is called.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * my_allocated_bytes into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *new_allocated_bytes)
+{
+ my_allocated_bytes = new_allocated_bytes;
+ *new_allocated_bytes = local_my_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the location set up
+ * by pgstat_set_allocated_bytes_storage() becomes invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ my_allocated_bytes = &local_my_allocated_bytes;
+}
+
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6cddd74aa7..2d672101f0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -302,7 +302,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -358,6 +358,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->allocated_bytes);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index ef10bb1690..94e5f4b5be 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation + context->mem_allocated, PG_ALLOC_DECREASE);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1043,6 +1057,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes(block->endptr - ((char *) block), PG_ALLOC_DECREASE);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1173,7 +1188,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes(oldblksize, PG_ALLOC_DECREASE);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 93825265a1..97007ca0ee 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes(context->mem_allocated, PG_ALLOC_DECREASE);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -726,6 +739,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes(block->blksize, PG_ALLOC_DECREASE);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 33dca0f37c..e314f8f343 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -69,6 +69,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -413,6 +414,13 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add context header size to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock),
+ PG_ALLOC_INCREASE);
+
return (MemoryContext) slab;
}
@@ -429,6 +437,7 @@ SlabReset(MemoryContext context)
SlabContext *slab = (SlabContext *) context;
dlist_mutable_iter miter;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -465,9 +474,11 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
slab->curBlocklistIndex = 0;
Assert(context->mem_allocated == 0);
@@ -480,8 +491,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ /*
+ * Until context header allocation is included in context->mem_allocated,
+ * cast to slab and decrement the header allocation
+ */
+ pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(((SlabContext *)context)->chunksPerBlock),
+ PG_ALLOC_DECREASE);
+
/* And free the context header */
free(context);
}
@@ -546,6 +566,7 @@ SlabAlloc(MemoryContext context, Size size)
block->slab = slab;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_INCREASE);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -732,6 +753,7 @@ SlabFree(void *pointer)
#endif
free(block);
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_DECREASE);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7be9a50147..f8a3f2ffa2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5373,9 +5373,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,allocated_bytes}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index f7bd83113a..754ff0dc62 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
#include "utils/backend_progress.h"
+#include "common/int.h"
/* ----------
@@ -32,6 +33,13 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocated bytes */
+enum allocation_direction
+{
+ PG_ALLOC_DECREASE = -1,
+ PG_ALLOC_IGNORE,
+ PG_ALLOC_INCREASE,
+};
/* ----------
* Shared-memory data structures
@@ -169,6 +177,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
} PgBackendStatus;
@@ -293,6 +304,7 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
/* ----------
@@ -324,7 +336,8 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -336,5 +349,53 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes() -
+ *
+ * Called to report change in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized. allocation_direction is a
+ * positive/negative multiplier enum defined above.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction)
+{
+ uint64 temp;
+
+ /*
+ * Avoid *my_allocated_bytes unsigned integer overflow on
+ * PG_ALLOC_DECREASE
+ */
+ if (allocation_direction == PG_ALLOC_DECREASE &&
+ pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp))
+ {
+ *my_allocated_bytes = 0;
+ ereport(LOG,
+ errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated. Setting reported to 0.",
+ MyProcPid, (long long) allocated_bytes,
+ (unsigned long long) *my_allocated_bytes));
+ }
+ else
+ *my_allocated_bytes += (allocated_bytes) * allocation_direction;
+
+ return;
+}
+
+/* ---------
+ * pgstat_zero_my_allocated_bytes() -
+ *
+ * Called to zero out local allocated bytes variable after fork to avoid double
+ * counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_zero_my_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index fb9f936d43..5f854aab18 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1756,10 +1756,11 @@ pg_stat_activity| SELECT s.datid,
s.state,
s.backend_xid,
s.backend_xmin,
+ s.allocated_bytes,
s.query_id,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1874,7 +1875,7 @@ pg_stat_gssapi| SELECT s.pid,
s.gss_auth AS gss_authenticated,
s.gss_princ AS principal,
s.gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2055,7 +2056,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2089,7 +2090,7 @@ pg_stat_ssl| SELECT s.pid,
s.ssl_client_dn AS client_dn,
s.ssl_client_serial AS client_serial,
s.ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (s.client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 1d84407a03..ab7e95c367 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1126,4 +1126,15 @@ SELECT pg_stat_get_subscription_stats(NULL);
(1 row)
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index b4d6753c71..2f0b1cc9d8 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -535,5 +535,8 @@ SET enable_seqscan TO on;
SELECT pg_stat_get_replication_slot(NULL);
SELECT pg_stat_get_subscription_stats(NULL);
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
-- End of Stats Test
--
2.25.1
Hi,
On 2023-01-05 13:44:20 -0500, Reid Thompson wrote:
From 0a6b152e0559a250dddd33bd7d43eb0959432e0d Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activityThis new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value.
It doesn't actually malloc/free. It tracks palloc/pfree.
Dynamic shared memory allocations are included only in the value displayed
for the backend that created them, they are not included in the value for
backends that are attached to them to avoid double counting.
As mentioned before, I don't think accounting DSM this way makes sense.
--- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -407,6 +407,9 @@ StartAutoVacLauncher(void)#ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild();
@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild();diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index eac3450774..24278e5c18 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4102,6 +4102,9 @@ BackendStartup(Port *port) { free(bn);+ /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* Detangle from postmaster */ InitPostmasterChild();
It doesn't at all seem right to call pgstat_zero_my_allocated_bytes() here,
before even InitPostmasterChild() is called. Nor does it seem right to add the
call to so many places.
Note that this is before we even delete postmaster's memory, see e.g.:
/*
* If the PostmasterContext is still around, recycle the space; we don't
* need it anymore after InitPostgres completes. Note this does not trash
* *MyProcPort, because ConnCreate() allocated that space with malloc()
* ... else we'd need to copy the Port data first. Also, subsidiary data
* such as the username isn't lost either; see ProcessStartupPacket().
*/
if (PostmasterContext)
{
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
}
calling pgstat_zero_my_allocated_bytes() before we do this will lead to
undercounting memory usage, afaict.
+/* Enum helper for reporting memory allocated bytes */ +enum allocation_direction +{ + PG_ALLOC_DECREASE = -1, + PG_ALLOC_IGNORE, + PG_ALLOC_INCREASE, +};
What's the point of this?
+/* ---------- + * pgstat_report_allocated_bytes() - + * + * Called to report change in memory allocated for this backend. + * + * my_allocated_bytes initially points to local memory, making it safe to call + * this before pgstats has been initialized. allocation_direction is a + * positive/negative multiplier enum defined above. + * ---------- + */ +static inline void +pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction)
I don't think this should take allocation_direction as a parameter, I'd make
it two different functions.
+{ + uint64 temp; + + /* + * Avoid *my_allocated_bytes unsigned integer overflow on + * PG_ALLOC_DECREASE + */ + if (allocation_direction == PG_ALLOC_DECREASE && + pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp)) + { + *my_allocated_bytes = 0; + ereport(LOG, + errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated. Setting reported to 0.", + MyProcPid, (long long) allocated_bytes, + (unsigned long long) *my_allocated_bytes));
We certainly shouldn't have an ereport in here. This stuff really needs to be
cheap.
+ } + else + *my_allocated_bytes += (allocated_bytes) * allocation_direction;
Superfluous parens?
+/* ---------- + * pgstat_get_all_memory_allocated() - + * + * Return a uint64 representing the current shared memory allocated to all + * backends. This looks directly at the BackendStatusArray, and so will + * provide current information regardless of the age of our transaction's + * snapshot of the status array. + * In the future we will likely utilize additional values - perhaps limit + * backend allocation by user/role, etc. + * ---------- + */ +uint64 +pgstat_get_all_backend_memory_allocated(void) +{ + PgBackendStatus *beentry; + int i; + uint64 all_memory_allocated = 0; + + beentry = BackendStatusArray; + + /* + * We probably shouldn't get here before shared memory has been set up, + * but be safe. + */ + if (beentry == NULL || BackendActivityBuffer == NULL) + return 0; + + /* + * We include AUX procs in all backend memory calculation + */ + for (i = 1; i <= NumBackendStatSlots; i++) + { + /* + * We use a volatile pointer here to ensure the compiler doesn't try + * to get cute. + */ + volatile PgBackendStatus *vbeentry = beentry; + bool found; + uint64 allocated_bytes = 0; + + for (;;) + { + int before_changecount; + int after_changecount; + + pgstat_begin_read_activity(vbeentry, before_changecount); + + /* + * Ignore invalid entries, which may contain invalid data. + * See pgstat_beshutdown_hook() + */ + if (vbeentry->st_procpid > 0) + allocated_bytes = vbeentry->allocated_bytes; + + pgstat_end_read_activity(vbeentry, after_changecount); + + if ((found = pgstat_read_activity_complete(before_changecount, + after_changecount))) + break; + + /* Make sure we can break out of loop if stuck... */ + CHECK_FOR_INTERRUPTS(); + } + + if (found) + all_memory_allocated += allocated_bytes; + + beentry++; + } + + return all_memory_allocated; +} + +/* + * Determine if allocation request will exceed max backend memory allowed. + * Do not apply to auxiliary processes. + */ +bool +exceeds_max_total_bkend_mem(uint64 allocation_request) +{ + bool result = false; + + /* Exclude auxiliary processes from the check */ + if (MyAuxProcType != NotAnAuxProcess) + return result; + + /* Convert max_total_bkend_mem to bytes for comparison */ + if (max_total_bkend_mem && + pgstat_get_all_backend_memory_allocated() + + allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024) + { + /* + * Explicitly identify the OOM being a result of this configuration + * parameter vs a system failure to allocate OOM. + */ + ereport(WARNING, + errmsg("allocation would exceed max_total_memory limit (%llu > %llu)", + (unsigned long long) pgstat_get_all_backend_memory_allocated() + + allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024)); + + result = true; + }
I think it's completely unfeasible to execute something as expensive as
pgstat_get_all_backend_memory_allocated() on every allocation. Like,
seriously, no.
And we absolutely definitely shouldn't just add CHECK_FOR_INTERRUPT() calls
into the middle of allocator code.
Greetings,
Andres Freund
On Mon, 2023-01-09 at 18:31 -0800, Andres Freund wrote:
Hi,
On 2023-01-05 13:44:20 -0500, Reid Thompson wrote:
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value.It doesn't actually malloc/free. It tracks palloc/pfree.
I will update the message
Dynamic shared memory allocations are included only in the value displayed
for the backend that created them, they are not included in the value for
backends that are attached to them to avoid double counting.As mentioned before, I don't think accounting DSM this way makes sense.
Understood, previously you noted 'There are a few uses of DSMs that track
shared resources, with the biggest likely being the stats for relations
etc'. I'd like to come up with a solution to address this; identifying the
long term allocations to shared state and accounting for them such that they
don't get 'lost' when the allocating backend exits. Any guidance or
direction would be appreciated.
--- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -407,6 +407,9 @@ StartAutoVacLauncher(void)#ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild();@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild();diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index eac3450774..24278e5c18 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4102,6 +4102,9 @@ BackendStartup(Port *port) { free(bn);+ /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* Detangle from postmaster */ InitPostmasterChild();It doesn't at all seem right to call pgstat_zero_my_allocated_bytes() here,
before even InitPostmasterChild() is called. Nor does it seem right to add the
call to so many places.Note that this is before we even delete postmaster's memory, see e.g.:
/*
* If the PostmasterContext is still around, recycle the space; we don't
* need it anymore after InitPostgres completes. Note this does not trash
* *MyProcPort, because ConnCreate() allocated that space with malloc()
* ... else we'd need to copy the Port data first. Also, subsidiary data
* such as the username isn't lost either; see ProcessStartupPacket().
*/
if (PostmasterContext)
{
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
}calling pgstat_zero_my_allocated_bytes() before we do this will lead to
undercounting memory usage, afaict.
OK - I'll trace back through these and see if I can better locate and reduce the
number of invocations.
+/* Enum helper for reporting memory allocated bytes */ +enum allocation_direction +{ + PG_ALLOC_DECREASE = -1, + PG_ALLOC_IGNORE, + PG_ALLOC_INCREASE, +};What's the point of this?
+/* ---------- + * pgstat_report_allocated_bytes() - + * + * Called to report change in memory allocated for this backend. + * + * my_allocated_bytes initially points to local memory, making it safe to call + * this before pgstats has been initialized. allocation_direction is a + * positive/negative multiplier enum defined above. + * ---------- + */ +static inline void +pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction)I don't think this should take allocation_direction as a parameter, I'd make
it two different functions.
Originally it was two functions, a suggestion was made in the thread to
maybe consolidate them to a single function with a direction indicator,
hence the above. I'm fine with converting it back to separate functions.
+ if (allocation_direction == PG_ALLOC_DECREASE && + pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp)) + { + *my_allocated_bytes = 0; + ereport(LOG, + errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated. Setting reported to 0.", + MyProcPid, (long long) allocated_bytes, + (unsigned long long) *my_allocated_bytes));We certainly shouldn't have an ereport in here. This stuff really needs to be
cheap.
I will remove the ereport.
+ *my_allocated_bytes += (allocated_bytes) * allocation_direction;
Superfluous parens?
I will remove these.
+/* ---------- + * pgstat_get_all_memory_allocated() - + * + * Return a uint64 representing the current shared memory allocated to all + * backends. This looks directly at the BackendStatusArray, and so will + * provide current information regardless of the age of our transaction's + * snapshot of the status array. + * In the future we will likely utilize additional values - perhaps limit + * backend allocation by user/role, etc. + * ----------I think it's completely unfeasible to execute something as expensive as
pgstat_get_all_backend_memory_allocated() on every allocation. Like,
seriously, no.
Ok. Do we check every nth allocation/try to implement a scheme of checking
more often as we we get closer to the declared max_total_bkend_mem?
And we absolutely definitely shouldn't just add CHECK_FOR_INTERRUPT() calls
into the middle of allocator code.
I'm open to guidance/suggestions/pointers to remedying these.
Greetings,
Andres Freund
Thanks,
Reid
Hi,
On 2023-01-13 09:15:10 -0500, Reid Thompson wrote:
On Mon, 2023-01-09 at 18:31 -0800, Andres Freund wrote:
Dynamic shared memory allocations are included only in the value displayed
for the backend that created them, they are not included in the value for
backends that are attached to them to avoid double counting.As mentioned before, I don't think accounting DSM this way makes sense.
Understood, previously you noted 'There are a few uses of DSMs that track
shared resources, with the biggest likely being the stats for relations
etc'. I'd like to come up with a solution to address this; identifying the
long term allocations to shared state and accounting for them such that they
don't get 'lost' when the allocating backend exits. Any guidance or
direction would be appreciated.
Tracking it as backend memory usage doesn't seem helpful to me, particularly
because some of it is for server wide data tracking (pgstats, some
caches). But that doesn't mean you couldn't track and report it
separately.
+/* ---------- + * pgstat_get_all_memory_allocated() - + * + * Return a uint64 representing the current shared memory allocated to all + * backends. This looks directly at the BackendStatusArray, and so will + * provide current information regardless of the age of our transaction's + * snapshot of the status array. + * In the future we will likely utilize additional values - perhaps limit + * backend allocation by user/role, etc. + * ----------I think it's completely unfeasible to execute something as expensive as
pgstat_get_all_backend_memory_allocated() on every allocation. Like,
seriously, no.Ok. Do we check every nth allocation/try to implement a scheme of checking
more often as we we get closer to the declared max_total_bkend_mem?
I think it's just not acceptable to do O(connections) work as part of
something critical as memory allocation. Even if amortized imo.
What you could do is to have a single, imprecise, shared counter for the total
memory allocation, and have a backend-local "allowance". When the allowance is
used up, refill it from the shared counter (a single atomic op).
But honestly, I think we first need to have the accounting for a while before
it makes sense to go for the memory limiting patch. And I doubt a single GUC
will suffice to make this usable.
And we absolutely definitely shouldn't just add CHECK_FOR_INTERRUPT() calls
into the middle of allocator code.I'm open to guidance/suggestions/pointers to remedying these.
Well, just don't have the CHECK_FOR_INTERRUPT(). Nor the O(N) operation.
You also can't do the ereport(WARNING) there, that itself allocates memory,
and could lead to recursion in some edge cases.
Greetings,
Andres Freund
On Fri, 6 Jan 2023 at 00:19, Reid Thompson
<reid.thompson@crunchydata.com> wrote:
On Tue, 2023-01-03 at 16:22 +0530, vignesh C wrote:
....
The patch does not apply on top of HEAD as in [1], please post a
rebased patch:
...
Regards,
VigneshAttached is rebased patch, with some updates related to committed changes.
The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3867.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
48880840f18cb75fcaecc77b5e7816b92c27157b ===
=== applying patch
./0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch
....
patching file src/test/regress/expected/rules.out
Hunk #2 FAILED at 1875.
Hunk #4 FAILED at 2090.
2 out of 4 hunks FAILED -- saving rejects to file
src/test/regress/expected/rules.out.rej
[1]: http://cfbot.cputube.org/patch_41_3867.log
Regards,
Vignesh
On Thu, 2023-01-19 at 16:50 +0530, vignesh C wrote:
The patch does not apply on top of HEAD as in [1], please post a rebased patch:
Regards,
Vignesh
rebased patch attached
Thanks,
Reid
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From b32a346d6e0e00c568e9a285ad15fc2703998c26 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 +++++
src/backend/storage/ipc/dsm_impl.c | 12 ++
src/backend/utils/activity/backend_status.c | 108 ++++++++++++++++++
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 17 +++
src/backend/utils/mmgr/generation.c | 9 ++
src/backend/utils/mmgr/slab.c | 9 +-
src/include/utils/backend_status.h | 3 +
9 files changed, 197 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f985afc009..51ed4623be 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2113,6 +2113,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>allocated_bytes</varname>) are displayed in the
+ <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 22885c7bd2..f7047107d5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -525,6 +529,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -719,6 +727,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 7baf2db57d..da2b5fb042 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,9 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int max_total_bkend_mem = 0;
+
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -1239,3 +1242,108 @@ pgstat_reset_allocated_bytes_storage(void)
my_allocated_bytes = &local_my_allocated_bytes;
}
+/* ----------
+ * pgstat_get_all_memory_allocated() -
+ *
+ * Return a uint64 representing the current shared memory allocated to all
+ * backends. This looks directly at the BackendStatusArray, and so will
+ * provide current information regardless of the age of our transaction's
+ * snapshot of the status array.
+ * In the future we will likely utilize additional values - perhaps limit
+ * backend allocation by user/role, etc.
+ * ----------
+ */
+uint64
+pgstat_get_all_backend_memory_allocated(void)
+{
+ PgBackendStatus *beentry;
+ int i;
+ uint64 all_memory_allocated = 0;
+
+ beentry = BackendStatusArray;
+
+ /*
+ * We probably shouldn't get here before shared memory has been set up,
+ * but be safe.
+ */
+ if (beentry == NULL || BackendActivityBuffer == NULL)
+ return 0;
+
+ /*
+ * We include AUX procs in all backend memory calculation
+ */
+ for (i = 1; i <= NumBackendStatSlots; i++)
+ {
+ /*
+ * We use a volatile pointer here to ensure the compiler doesn't try
+ * to get cute.
+ */
+ volatile PgBackendStatus *vbeentry = beentry;
+ bool found;
+ uint64 allocated_bytes = 0;
+
+ for (;;)
+ {
+ int before_changecount;
+ int after_changecount;
+
+ pgstat_begin_read_activity(vbeentry, before_changecount);
+
+ /*
+ * Ignore invalid entries, which may contain invalid data.
+ * See pgstat_beshutdown_hook()
+ */
+ if (vbeentry->st_procpid > 0)
+ allocated_bytes = vbeentry->allocated_bytes;
+
+ pgstat_end_read_activity(vbeentry, after_changecount);
+
+ if ((found = pgstat_read_activity_complete(before_changecount,
+ after_changecount)))
+ break;
+
+ /* Make sure we can break out of loop if stuck... */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ if (found)
+ all_memory_allocated += allocated_bytes;
+
+ beentry++;
+ }
+
+ return all_memory_allocated;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /* Exclude auxiliary processes from the check */
+ if (MyAuxProcType != NotAnAuxProcess)
+ return result;
+
+ /* Convert max_total_bkend_mem to bytes for comparison */
+ if (max_total_bkend_mem &&
+ pgstat_get_all_backend_memory_allocated() +
+ allocation_request > (uint64) max_total_bkend_mem * 1024 * 1024)
+ {
+ /*
+ * Explicitly identify the OOM being a result of this configuration
+ * parameter vs a system failure to allocate OOM.
+ */
+ ereport(WARNING,
+ errmsg("allocation would exceed max_total_memory limit (%llu > %llu)",
+ (unsigned long long) pgstat_get_all_backend_memory_allocated() +
+ allocation_request, (unsigned long long) max_total_bkend_mem * 1024 * 1024));
+
+ result = true;
+ }
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4ac808ed22..d6f3b4e262 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3467,6 +3467,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f..bc2d449c87 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -156,6 +156,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 37e82bcd70..7e50971f58 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -741,6 +745,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -938,6 +947,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1176,6 +1189,10 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /* Do not exceed maximum allowed memory allocation */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize - oldblksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b06fb0c6a4..18d43d52bd 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -380,6 +383,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +489,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index e314f8f343..adc88e0047 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -356,9 +356,12 @@ SlabContextCreate(MemoryContext parent,
elog(ERROR, "block size %zu for slab is too small for %zu-byte chunks",
blockSize, chunkSize);
-
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock)))
+ return NULL;
slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
+
if (slab == NULL)
{
MemoryContextStats(TopMemoryContext);
@@ -559,6 +562,10 @@ SlabAlloc(MemoryContext context, Size size)
}
else
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (unlikely(block == NULL))
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 754ff0dc62..33269eb11b 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -297,6 +297,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -336,6 +337,7 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
+extern uint64 pgstat_get_all_backend_memory_allocated(void);
extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
extern void pgstat_reset_allocated_bytes_storage(void);
@@ -348,6 +350,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
* pgstat_report_allocated_bytes() -
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From fe8ea9bc008df25f439048d6c653de8439ce5baa Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
palloc'd/pfree'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 15 ++++
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/autovacuum.c | 6 ++
src/backend/postmaster/postmaster.c | 13 ++++
src/backend/postmaster/syslogger.c | 3 +
src/backend/storage/ipc/dsm_impl.c | 81 +++++++++++++++++++++
src/backend/utils/activity/backend_status.c | 45 ++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 17 +++++
src/backend/utils/mmgr/generation.c | 15 ++++
src/backend/utils/mmgr/slab.c | 22 ++++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 63 +++++++++++++++-
src/test/regress/expected/rules.out | 9 ++-
src/test/regress/expected/stats.out | 11 +++
src/test/regress/sql/stats.sql | 3 +
16 files changed, 305 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b6..844d9019dd 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -957,6 +957,21 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use <function>pg_size_pretty</function>
+ described in <xref linkend="functions-admin-dbsize"/> to make this value
+ more easily readable.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8608e3fa5b..aacd269b01 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -863,6 +863,7 @@ CREATE VIEW pg_stat_activity AS
S.state,
S.backend_xid,
s.backend_xmin,
+ S.allocated_bytes,
S.query_id,
S.query,
S.backend_type
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f5ea381c53..09f5624ade 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -407,6 +407,9 @@ StartAutoVacLauncher(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 711efc35e3..226d6c0a6f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4161,6 +4161,9 @@ BackendStartup(Port *port)
{
free(bn);
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* Detangle from postmaster */
InitPostmasterChild();
@@ -5368,6 +5371,11 @@ StartChildProcess(AuxProcType type)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
AuxiliaryProcessMain(type); /* does not return */
}
#endif /* EXEC_BACKEND */
@@ -5761,6 +5769,11 @@ do_start_bgworker(RegisteredBgWorker *rw)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
StartBackgroundWorker();
exit(1); /* should not get here */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index a876d02c6f..0d51af6fd8 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -679,6 +679,9 @@ SysLogger_Start(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index f0965c3481..22885c7bd2 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_allocated_bytes(request_size - *mapped_size,
+ PG_ALLOC_INCREASE);
+ }
+#else
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +576,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +631,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +706,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +829,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(info.RegionSize, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +879,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1007,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 608d01ea0d..7baf2db57d 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,9 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +403,15 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local_my_allocated_bytes to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes);
+
+ /* Populate sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -459,6 +471,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /*
+ * Stop reporting memory allocation changes to &MyBEEntry->allocated_bytes
+ */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1194,3 +1211,31 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * *allocated_bytes. *allocated_bytes needs to be valid until
+ * pgstat_set_allocated_bytes_storage() is called.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * my_allocated_bytes into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *new_allocated_bytes)
+{
+ my_allocated_bytes = new_allocated_bytes;
+ *new_allocated_bytes = local_my_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the location set up
+ * by pgstat_set_allocated_bytes_storage() becomes invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ my_allocated_bytes = &local_my_allocated_bytes;
+}
+
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6737493402..f4f2c8dce1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -303,7 +303,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -359,6 +359,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->allocated_bytes);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 740729b5d0..37e82bcd70 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation + context->mem_allocated, PG_ALLOC_DECREASE);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1041,6 +1055,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes(block->endptr - ((char *) block), PG_ALLOC_DECREASE);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1171,7 +1186,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes(oldblksize, PG_ALLOC_DECREASE);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index ebcb61e9b6..b06fb0c6a4 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes(context->mem_allocated, PG_ALLOC_DECREASE);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -729,6 +742,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes(block->blksize, PG_ALLOC_DECREASE);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 33dca0f37c..e314f8f343 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -69,6 +69,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -413,6 +414,13 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add context header size to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock),
+ PG_ALLOC_INCREASE);
+
return (MemoryContext) slab;
}
@@ -429,6 +437,7 @@ SlabReset(MemoryContext context)
SlabContext *slab = (SlabContext *) context;
dlist_mutable_iter miter;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -465,9 +474,11 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
slab->curBlocklistIndex = 0;
Assert(context->mem_allocated == 0);
@@ -480,8 +491,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ /*
+ * Until context header allocation is included in context->mem_allocated,
+ * cast to slab and decrement the header allocation
+ */
+ pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(((SlabContext *)context)->chunksPerBlock),
+ PG_ALLOC_DECREASE);
+
/* And free the context header */
free(context);
}
@@ -546,6 +566,7 @@ SlabAlloc(MemoryContext context, Size size)
block->slab = slab;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_INCREASE);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -732,6 +753,7 @@ SlabFree(void *pointer)
#endif
free(block);
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_DECREASE);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3fc9a30c60 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5404,9 +5404,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,allocated_bytes}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index f7bd83113a..754ff0dc62 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
#include "utils/backend_progress.h"
+#include "common/int.h"
/* ----------
@@ -32,6 +33,13 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocated bytes */
+enum allocation_direction
+{
+ PG_ALLOC_DECREASE = -1,
+ PG_ALLOC_IGNORE,
+ PG_ALLOC_INCREASE,
+};
/* ----------
* Shared-memory data structures
@@ -169,6 +177,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
} PgBackendStatus;
@@ -293,6 +304,7 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
/* ----------
@@ -324,7 +336,8 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -336,5 +349,53 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes() -
+ *
+ * Called to report change in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized. allocation_direction is a
+ * positive/negative multiplier enum defined above.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction)
+{
+ uint64 temp;
+
+ /*
+ * Avoid *my_allocated_bytes unsigned integer overflow on
+ * PG_ALLOC_DECREASE
+ */
+ if (allocation_direction == PG_ALLOC_DECREASE &&
+ pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp))
+ {
+ *my_allocated_bytes = 0;
+ ereport(LOG,
+ errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated. Setting reported to 0.",
+ MyProcPid, (long long) allocated_bytes,
+ (unsigned long long) *my_allocated_bytes));
+ }
+ else
+ *my_allocated_bytes += (allocated_bytes) * allocation_direction;
+
+ return;
+}
+
+/* ---------
+ * pgstat_zero_my_allocated_bytes() -
+ *
+ * Called to zero out local allocated bytes variable after fork to avoid double
+ * counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_zero_my_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e7a2f5856a..ca8eeec7de 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1756,10 +1756,11 @@ pg_stat_activity| SELECT s.datid,
s.state,
s.backend_xid,
s.backend_xmin,
+ s.allocated_bytes,
s.query_id,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1874,7 +1875,7 @@ pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
gss_princ AS principal,
gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (client_port IS NOT NULL);
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
@@ -2055,7 +2056,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2089,7 +2090,7 @@ pg_stat_ssl| SELECT pid,
ssl_client_dn AS client_dn,
ssl_client_serial AS client_serial,
ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 1d84407a03..ab7e95c367 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1126,4 +1126,15 @@ SELECT pg_stat_get_subscription_stats(NULL);
(1 row)
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index b4d6753c71..2f0b1cc9d8 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -535,5 +535,8 @@ SET enable_seqscan TO on;
SELECT pg_stat_get_replication_slot(NULL);
SELECT pg_stat_get_subscription_stats(NULL);
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
-- End of Stats Test
--
2.25.1
Hi,
On 2023-01-23 10:48:38 -0500, Reid Thompson wrote:
On Thu, 2023-01-19 at 16:50 +0530, vignesh C wrote:
The patch does not apply on top of HEAD as in [1], please post a rebased patch:
Regards,
Vigneshrebased patch attached
I think it's basically still waiting on author, until the O(N) cost is gone
from the overflow limit check.
Greetings,
Andres Freund
On Mon, 2023-01-23 at 12:31 -0800, Andres Freund wrote:
Hi,
I think it's basically still waiting on author, until the O(N) cost is gone
from the overflow limit check.Greetings,
Andres Freund
Yes, just a rebase. There is still work to be done per earlier in the
thread.
I do want to follow up and note re palloc/pfree vs malloc/free that the
tracking code (0001-Add-tracking-...) is not tracking palloc/pfree but is
explicitely tracking malloc/free. Not every palloc/pfree call executes the
tracking code, only those where the path followed includes malloc() or
free(). Routine palloc() calls fulfilled from the context's
freelist/emptyblocks/freeblock/etc and pfree() calls not invoking free()
avoid the tracking code.
Thanks,
Reid
Regarding the shared counter noted here,
What you could do is to have a single, imprecise, shared counter for the total
memory allocation, and have a backend-local "allowance". When the allowance is
used up, refill it from the shared counter (a single atomic op).
Is there a preferred or suggested location to put variables like this?
Perhaps a current variable to use as a reference?
Thanks,
Reid
Hi,
On 2023-01-26 15:27:20 -0500, Reid Thompson wrote:
Yes, just a rebase. There is still work to be done per earlier in the
thread.
The tests recently started to fail:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3867
I do want to follow up and note re palloc/pfree vs malloc/free that the
tracking code (0001-Add-tracking-...) is not tracking palloc/pfree but is
explicitely tracking malloc/free. Not every palloc/pfree call executes the
tracking code, only those where the path followed includes malloc() or
free(). Routine palloc() calls fulfilled from the context's
freelist/emptyblocks/freeblock/etc and pfree() calls not invoking free()
avoid the tracking code.
Sure, but we create a lot of memory contexts, so that's not a whole lot of
comfort.
I marked this as waiting on author.
Greetings,
Andres Freund
On Mon, 2023-02-13 at 16:26 -0800, Andres Freund wrote:
Hi,
The tests recently started to fail:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3867
I marked this as waiting on author.
Greetings,
Andres Freund
Patch has been rebased to master.
The memory limiting portion (patch 0002-*) has been refactored to utilize a
shared counter for total memory allocation along with backend-local
allowances that are initialized at process startup and refilled from the
central counter upon being used up. Free'd memory is accumulated and
returned to the shared counter upon meeting a threshold and/or upon process
exit. At this point arbitrarily picked 1MB as the initial allowance and
return threshold.
Thanks,
Reid
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From e044bacedab503d1cd732146e1b9947406191bb6 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated to pg_stat_activity.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
doc/src/sgml/config.sgml | 26 ++
src/backend/postmaster/autovacuum.c | 8 +-
src/backend/postmaster/postmaster.c | 17 +-
src/backend/postmaster/syslogger.c | 4 +-
src/backend/storage/ipc/dsm_impl.c | 35 ++-
src/backend/storage/lmgr/proc.c | 3 +
src/backend/utils/activity/backend_status.c | 223 +++++++++++++++++-
src/backend/utils/misc/guc_tables.c | 11 +
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 43 +++-
src/backend/utils/mmgr/generation.c | 21 +-
src/backend/utils/mmgr/slab.c | 21 +-
src/include/storage/proc.h | 7 +
src/include/utils/backend_status.h | 222 +++++++++++++++--
14 files changed, 560 insertions(+), 84 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5c41cc6c6..1bff68b1ec 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2113,6 +2113,32 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. A backend request that would
+ push the total over the limit will be denied with an out of memory
+ error causing that backend's current query/transaction to fail. Due to
+ the dynamic nature of memory allocations, this limit is not exact. If
+ within 1.5MB of the limit and two backends request 1MB each at the same
+ time both may be allocated, and exceed the limit. Further requests will
+ not be allocated until dropping below the limit. Keep this in mind when
+ setting this value. This limit does not affect auxiliary backend
+ processes <xref linkend="glossary-auxiliary-proc"/> . Backend memory
+ allocations (<varname>allocated_bytes</varname>) are displayed in the
+ <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>
+ view.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 59c9bcf8c4..ee03d08dd9 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -407,8 +407,8 @@ StartAutoVacLauncher(void)
#ifndef EXEC_BACKEND
case 0:
- /* Zero allocated bytes to avoid double counting parent allocation */
- pgstat_zero_my_allocated_bytes();
+ /* Init allocated bytes to avoid double counting parent allocation */
+ pgstat_init_allocated_bytes();
/* in postmaster child ... */
InitPostmasterChild();
@@ -1488,8 +1488,8 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND
case 0:
- /* Zero allocated bytes to avoid double counting parent allocation */
- pgstat_zero_my_allocated_bytes();
+ /* Init allocated bytes to avoid double counting parent allocation */
+ pgstat_init_allocated_bytes();
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 1f09781be8..358a7fa980 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4167,8 +4167,8 @@ BackendStartup(Port *port)
{
free(bn);
- /* Zero allocated bytes to avoid double counting parent allocation */
- pgstat_zero_my_allocated_bytes();
+ /* Init allocated bytes to avoid double counting parent allocation */
+ pgstat_init_allocated_bytes();
/* Detangle from postmaster */
InitPostmasterChild();
@@ -5377,10 +5377,11 @@ StartChildProcess(AuxProcType type)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
- /* Zero allocated bytes to avoid double counting parent allocation.
+ /*
+ * Init allocated bytes to avoid double counting parent allocation.
* Needs to be after the MemoryContextDelete(PostmasterContext) above.
*/
- pgstat_zero_my_allocated_bytes();
+ pgstat_init_allocated_bytes();
AuxiliaryProcessMain(type); /* does not return */
}
@@ -5775,10 +5776,12 @@ do_start_bgworker(RegisteredBgWorker *rw)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
- /* Zero allocated bytes to avoid double counting parent allocation.
- * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ /*
+ * Init allocated bytes to avoid double counting parent
+ * allocation. Needs to be after the
+ * MemoryContextDelete(PostmasterContext) above.
*/
- pgstat_zero_my_allocated_bytes();
+ pgstat_init_allocated_bytes();
StartBackgroundWorker();
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 9081ae140f..e8e31ce403 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -679,8 +679,8 @@ SysLogger_Start(void)
#ifndef EXEC_BACKEND
case 0:
- /* Zero allocated bytes to avoid double counting parent allocation */
- pgstat_zero_my_allocated_bytes();
+ /* Init allocated bytes to avoid double counting parent allocation */
+ pgstat_init_allocated_bytes();
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 22885c7bd2..1131de06c0 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -240,7 +240,7 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
* allocation.
*/
if (op == DSM_OP_DESTROY)
- pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Create new segment or open an existing one for attach.
*
@@ -362,13 +366,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
* allocation increase.
*/
if (request_size > *mapped_size)
- {
- pgstat_report_allocated_bytes(request_size - *mapped_size,
- PG_ALLOC_INCREASE);
- }
+ pgstat_report_allocated_bytes_increase(request_size - *mapped_size, PG_ALLOC_DSM);
#else
- pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
- pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
#endif
}
*mapped_address = address;
@@ -525,6 +526,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -583,7 +588,7 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
* allocation.
*/
if (op == DSM_OP_DESTROY)
- pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -637,7 +642,7 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
* allocated in pg_stat_activity for the creator process.
*/
if (op == DSM_OP_CREATE)
- pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
@@ -712,13 +717,17 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
* allocation.
*/
if (op == DSM_OP_DESTROY)
- pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
@@ -834,7 +843,7 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
* allocated in pg_stat_activity for the creator process.
*/
if (op == DSM_OP_CREATE)
- pgstat_report_allocated_bytes(info.RegionSize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(info.RegionSize, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -885,7 +894,7 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
* shown allocated in pg_stat_activity when the creator destroys the
* allocation.
*/
- pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -1013,7 +1022,7 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
* allocated in pg_stat_activity for the creator process.
*/
if (op == DSM_OP_CREATE)
- pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..2f43bbb4c4 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -180,6 +180,9 @@ InitProcGlobal(void)
ProcGlobal->checkpointerLatch = NULL;
pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO);
pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
+ /* Convert max_total_bkend_mem to bytes and store */
+ if (max_total_bkend_mem > 0)
+ pg_atomic_init_u64(&ProcGlobal->max_total_bkend_mem_bytes, max_total_bkend_mem * 1024 * 1024);
/*
* Create and initialize all the PGPROC structures we'll need. There are
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 7baf2db57d..ba5c92b573 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,13 +45,57 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/*
+ * Max backend memory allocation allowed (MB). 0 = disabled.
+ * Centralized bucket ProcGlobal->max_total_bkend_mem is initialized
+ * as a byte representation of this value in InitProcGlobal().
+ */
+int max_total_bkend_mem = 0;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
-/* Memory allocated to this backend prior to pgstats initialization */
-uint64 local_my_allocated_bytes = 0;
-uint64 *my_allocated_bytes = &local_my_allocated_bytes;
+/*
+ * Define initial allocation allowance for a backend.
+ *
+ * NOTE: initial_allocation_allowance && allocation_allowance_refill_qty
+ * may be candidates for future GUC variables. Arbitrary 1MB selected initially.
+ */
+uint64 initial_allocation_allowance = 1024 * 1024;
+uint64 allocation_allowance_refill_qty = 1024 * 1024;
+
+/*
+ * Local counter to manage shared memory allocations. At backend startup, set to
+ * initial_allocation_allowance via pgstat_init_allocated_bytes(). Decrease as
+ * memory is malloc'd. When exhausted, atomically refill if available from
+ * ProcGlobal->max_total_bkend_mem via exceeds_max_total_bkend_mem().
+ */
+uint64 allocation_allowance = 0;
+
+/*
+ * Local counter of free'd shared memory. Return to global
+ * max_total_bkend_mem when return threshold is met. Arbitrary 1MB bytes
+ * selected initially.
+ */
+uint64 allocation_return = 0;
+uint64 allocation_return_threshold = 1024 * 1024;
+
+/*
+ * Memory allocated to this backend prior to pgstats initialization. Migrated to
+ * shared memory on pgstats initialization.
+ */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
+
+/* Memory allocated to this backend by type */
+/*
+ * TODO: add code to present these along with the global shared counter via a
+ * new system view
+ */
+uint64 aset_allocated_bytes = 0;
+uint64 dsm_allocated_bytes = 0;
+uint64 generation_allocated_bytes = 0;
+uint64 slab_allocated_bytes = 0;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -403,11 +447,18 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
- /* Alter allocation reporting from local_my_allocated_bytes to shared memory */
+ /*
+ * Alter allocation reporting from local_my_allocated_bytes to shared
+ * memory
+ */
pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes);
- /* Populate sum of memory allocated prior to pgstats initialization to pgstats
- * and zero the local variable.
+ /*
+ * Populate sum of memory allocated prior to pgstats initialization to
+ * pgstats and zero the local variable. This is a += assignment because
+ * InitPostgres allocates memory after pgstat_beinit but prior to
+ * pgstat_bestart so we have allocations to both local and shared memory
+ * to combine.
*/
lbeentry.allocated_bytes += local_my_allocated_bytes;
local_my_allocated_bytes = 0;
@@ -472,7 +523,8 @@ pgstat_beshutdown_hook(int code, Datum arg)
volatile PgBackendStatus *beentry = MyBEEntry;
/*
- * Stop reporting memory allocation changes to &MyBEEntry->allocated_bytes
+ * Stop reporting memory allocation changes to shared memory
+ * &MyBEEntry->allocated_bytes
*/
pgstat_reset_allocated_bytes_storage();
@@ -1221,21 +1273,170 @@ pgstat_clip_activity(const char *raw_activity)
* my_allocated_bytes into shared memory.
*/
void
-pgstat_set_allocated_bytes_storage(uint64 *new_allocated_bytes)
+pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes)
{
- my_allocated_bytes = new_allocated_bytes;
- *new_allocated_bytes = local_my_allocated_bytes;
+ my_allocated_bytes = allocated_bytes;
+ *allocated_bytes = local_my_allocated_bytes;
+
+ return;
}
/*
* Reset allocated bytes storage location.
*
* Expected to be called during backend shutdown, before the location set up
- * by pgstat_set_allocated_bytes_storage() becomes invalid.
+ * by pgstat_set_allocated_bytes_storage becomes invalid.
*/
void
pgstat_reset_allocated_bytes_storage(void)
{
+ /*
+ * When limiting maximum backend memory, return this backend's memory
+ * allocations to global.
+ */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ *my_allocated_bytes + allocation_allowance +
+ allocation_return);
+
+ /* Reset memory allocation variables */
+ allocation_allowance = 0;
+ allocation_return = 0;
+ aset_allocated_bytes = 0;
+ dsm_allocated_bytes = 0;
+ generation_allocated_bytes = 0;
+ slab_allocated_bytes = 0;
+ }
+
+ /* Reset memory allocation variables */
+ *my_allocated_bytes = local_my_allocated_bytes = 0;
+
+ /* Point my_allocated_bytes from shared memory back to local */
my_allocated_bytes = &local_my_allocated_bytes;
+
+ return;
}
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ * Refill allocation request bucket when needed/possible.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /*
+ * When limiting maximum backend memory, attempt to refill allocation
+ * request bucket if needed.
+ */
+ if (max_total_bkend_mem && allocation_request > allocation_allowance)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+ bool sts = false;
+
+ /*
+ * If allocation request is larger than memory refill quantity then
+ * attempt to increase allocation allowance with requested amount,
+ * otherwise fall through. If this refill fails we do not have enough
+ * memory to meet the request.
+ */
+ if (allocation_request >= allocation_allowance_refill_qty)
+ {
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_request)
+ {
+ if ((result = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_request)))
+ {
+ allocation_allowance = allocation_allowance + allocation_request;
+ break;
+ }
+ }
+
+ /*
+ * If the atomic exchange fails, we do not have enough reserve
+ * memory to meet the request. Negate result to return the proper
+ * value.
+ */
+ return !result;
+ }
+
+ /*
+ * Attempt to increase allocation allowance by memory refill quantity.
+ * If available memory is/becomes less than memory refill quantity,
+ * fall through to attempt to allocate remaining available memory.
+ */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_allowance_refill_qty)
+ {
+ if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_allowance_refill_qty)))
+ {
+ allocation_allowance = allocation_allowance + allocation_allowance_refill_qty;
+ break;
+ }
+ }
+
+ if (!sts)
+ {
+ /*
+ * If available_max_total_bkend_mem is 0, no memory is currently
+ * available to refill with, otherwise attempt to allocate
+ * remaining memory available if it exceeds the requested amount
+ * or the requested amount if more than requested amount gets
+ * returned while looping.
+ */
+ while ((available_max_total_bkend_mem = (int64) pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) > 0)
+ {
+ uint64 newval = 0;
+
+ /*
+ * If available memory is less than requested allocation we
+ * cannot fulfil request.
+ */
+ if (available_max_total_bkend_mem < allocation_request)
+ break;
+
+ /*
+ * If we happen to loop and a large chunk of memory has been
+ * returned to global, allocate request amount only.
+ */
+ if (available_max_total_bkend_mem > allocation_request)
+ newval = available_max_total_bkend_mem - allocation_request;
+
+ /* Allocate memory */
+ if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ newval)))
+ {
+ allocation_allowance = allocation_allowance +
+ newval == 0 ? available_max_total_bkend_mem : allocation_request;
+
+ break;
+ }
+ }
+ }
+
+ /*
+ * If refill is not successful, we return true, memory limit exceeded
+ */
+ if (!sts)
+ result = true;
+ }
+
+ /*
+ * Exclude auxiliary processes from the check. Return false. While we want
+ * to exclude them from the check, we do not want to exclude them from the
+ * above allocation handling.
+ */
+ if (MyAuxProcType != NotAnAuxProcess)
+ result = false;
+
+ return result;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1c0583fe26..639b63138b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3468,6 +3468,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f..bc2d449c87 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -156,6 +156,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 1a2d86239c..4d2dead51f 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,10 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -522,7 +526,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
- pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_ASET);
return (MemoryContext) set;
}
@@ -599,7 +603,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
- pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -663,7 +667,7 @@ AllocSetDelete(MemoryContext context)
free(oldset);
}
Assert(freelist->num_free == 0);
- pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
}
/* Now add the just-deleted context to the freelist. */
@@ -696,7 +700,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
- pgstat_report_allocated_bytes(deallocation + context->mem_allocated, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(deallocation + context->mem_allocated, PG_ALLOC_ASET);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -741,12 +745,17 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
context->mem_allocated += blksize;
- pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -938,6 +947,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -957,7 +970,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
- pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1055,7 +1068,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
- pgstat_report_allocated_bytes(block->endptr - ((char *) block), PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(block->endptr - ((char *) block), PG_ALLOC_ASET);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1176,6 +1189,18 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /*
+ * Do not exceed maximum allowed memory allocation. NOTE: checking for
+ * the full size here rather than just the amount of increased
+ * allocation to prevent a potential underflow of *my_allocation
+ * allowance in cases where blksize - oldblksize does not trigger a
+ * refill but blksize is greater than *my_allocation_allowance.
+ * Underflow would occur with the call below to
+ * pgstat_report_allocated_bytes_increase()
+ */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
@@ -1186,9 +1211,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
- pgstat_report_allocated_bytes(oldblksize, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(oldblksize, PG_ALLOC_ASET);
set->header.mem_allocated += blksize;
- pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index b06fb0c6a4..8f9d56eb0f 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,9 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ return NULL;
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -268,7 +271,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
- pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_GENERATION);
return (MemoryContext) set;
}
@@ -314,7 +317,7 @@ GenerationReset(MemoryContext context)
}
}
- pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_GENERATION);
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -337,7 +340,7 @@ GenerationDelete(MemoryContext context)
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
- pgstat_report_allocated_bytes(context->mem_allocated, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(context->mem_allocated, PG_ALLOC_GENERATION);
/* And free the context header and keeper block */
free(context);
@@ -380,12 +383,15 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
context->mem_allocated += blksize;
- pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* block with a single (used) chunk */
block->context = set;
@@ -483,13 +489,16 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
context->mem_allocated += blksize;
- pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -742,7 +751,7 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
- pgstat_report_allocated_bytes(block->blksize, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(block->blksize, PG_ALLOC_GENERATION);
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 15d3380640..de85781479 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -356,9 +356,12 @@ SlabContextCreate(MemoryContext parent,
elog(ERROR, "block size %zu for slab is too small for %zu-byte chunks",
blockSize, chunkSize);
-
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock)))
+ return NULL;
slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
+
if (slab == NULL)
{
MemoryContextStats(TopMemoryContext);
@@ -418,8 +421,7 @@ SlabContextCreate(MemoryContext parent,
* If SlabContextCreate is updated to add context header size to
* context->mem_allocated, then update here and SlabDelete appropriately
*/
- pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock),
- PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock), PG_ALLOC_SLAB);
return (MemoryContext) slab;
}
@@ -479,7 +481,7 @@ SlabReset(MemoryContext context)
}
}
- pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_SLAB);
slab->curBlocklistIndex = 0;
Assert(context->mem_allocated == 0);
@@ -500,8 +502,7 @@ SlabDelete(MemoryContext context)
* Until context header allocation is included in context->mem_allocated,
* cast to slab and decrement the header allocation
*/
- pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(((SlabContext *)context)->chunksPerBlock),
- PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(Slab_CONTEXT_HDRSZ(((SlabContext *) context)->chunksPerBlock), PG_ALLOC_SLAB);
/* And free the context header */
free(context);
@@ -560,6 +561,10 @@ SlabAlloc(MemoryContext context, Size size)
}
else
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (unlikely(block == NULL))
@@ -567,7 +572,7 @@ SlabAlloc(MemoryContext context, Size size)
block->slab = slab;
context->mem_allocated += slab->blockSize;
- pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_INCREASE);
+ pgstat_report_allocated_bytes_increase(slab->blockSize, PG_ALLOC_SLAB);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -754,7 +759,7 @@ SlabFree(void *pointer)
#endif
free(block);
slab->header.mem_allocated -= slab->blockSize;
- pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes_decrease(slab->blockSize, PG_ALLOC_SLAB);
}
/*
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 4258cd92c9..bacf879294 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -404,6 +404,13 @@ typedef struct PROC_HDR
int spins_per_delay;
/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
int startupBufferPinWaitBufId;
+
+ /*
+ * Max backend memory allocation tracker. Used/Initialized when
+ * max_total_bkend_mem > 0 as max_total_bkend_mem (MB) converted to bytes.
+ * Decreases/increases with free/malloc of backend memory.
+ */
+ pg_atomic_uint64 max_total_bkend_mem_bytes;
} PROC_HDR;
extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 754ff0dc62..32a1149007 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -14,6 +14,7 @@
#include "libpq/pqcomm.h"
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
+#include "storage/proc.h"
#include "utils/backend_progress.h"
#include "common/int.h"
@@ -33,12 +34,14 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
-/* Enum helper for reporting memory allocated bytes */
-enum allocation_direction
+/* Enum helper for reporting memory allocator type */
+enum pg_allocator_type
{
- PG_ALLOC_DECREASE = -1,
- PG_ALLOC_IGNORE,
- PG_ALLOC_INCREASE,
+ PG_ALLOC_ASET = 1,
+ PG_ALLOC_DSM,
+ PG_ALLOC_GENERATION,
+ PG_ALLOC_SLAB,
+ PG_ALLOC_ONSHUTDOWN,
};
/* ----------
@@ -297,6 +300,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -305,7 +309,15 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
extern PGDLLIMPORT uint64 *my_allocated_bytes;
+extern PGDLLIMPORT uint64 allocation_allowance;
+extern PGDLLIMPORT uint64 initial_allocation_allowance;
+extern PGDLLIMPORT uint64 allocation_return;
+extern PGDLLIMPORT uint64 allocation_return_threshold;
+extern PGDLLIMPORT uint64 aset_allocated_bytes;
+extern PGDLLIMPORT uint64 dsm_allocated_bytes;
+extern PGDLLIMPORT uint64 generation_allocated_bytes;
+extern PGDLLIMPORT uint64 slab_allocated_bytes;
/* ----------
* Functions called from postmaster
@@ -338,6 +350,7 @@ extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
extern uint64 pgstat_get_my_query_id(void);
extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
extern void pgstat_reset_allocated_bytes_storage(void);
+extern void decrease_max_total_bkend_mem(int64 decrease);
/* ----------
* Support functions for the SQL-callable functions to
@@ -348,53 +361,214 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
- * pgstat_report_allocated_bytes() -
- *
- * Called to report change in memory allocated for this backend.
+ * pgstat_report_allocated_bytes_decrease() -
+ * Called to report decrease in memory allocated for this backend.
*
* my_allocated_bytes initially points to local memory, making it safe to call
- * this before pgstats has been initialized. allocation_direction is a
- * positive/negative multiplier enum defined above.
+ * this before pgstats has been initialized.
* ----------
*/
static inline void
-pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction)
+pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes, int pg_allocator_type)
{
uint64 temp;
/*
- * Avoid *my_allocated_bytes unsigned integer overflow on
- * PG_ALLOC_DECREASE
+ * Avoid allocated_bytes unsigned integer overflow on decrease.
*/
- if (allocation_direction == PG_ALLOC_DECREASE &&
- pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp))
+ if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp))
{
+ /* Add free'd memory to allocation return counter. */
+ allocation_return += proc_allocated_bytes;
+
+ /* On overflow, reset pgstat count of allocated bytes to zero */
*my_allocated_bytes = 0;
- ereport(LOG,
- errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated. Setting reported to 0.",
- MyProcPid, (long long) allocated_bytes,
- (unsigned long long) *my_allocated_bytes));
+
+ /* Reset allocator type allocated bytes */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ aset_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_DSM:
+ dsm_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_GENERATION:
+ generation_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_SLAB:
+ slab_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_ONSHUTDOWN:
+ break;
+ }
+
+ /*
+ * Return free'd memory to the global counter when return threshold is
+ * met or process end.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
+ {
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global */
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart count */
+ allocation_return = 0;
+ }
+ }
}
else
- *my_allocated_bytes += (allocated_bytes) * allocation_direction;
+ {
+ /* Add free'd memory to allocation return counter. */
+ allocation_return += proc_allocated_bytes;
+
+ /* Decrease pgstat count of allocated bytes */
+ *my_allocated_bytes -= (proc_allocated_bytes);
+
+ /*
+ * Decrease allocator type allocated bytes. NOTE: per hackers dsm
+ * memory allocations lifespan may exceed process lifespan, so we may
+ * implement a long lived tracker for it ala max_total_bkend_mem_bytes
+ */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ aset_allocated_bytes -= (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_DSM:
+ dsm_allocated_bytes -= (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_GENERATION:
+ generation_allocated_bytes -= (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_SLAB:
+ slab_allocated_bytes -= (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_ONSHUTDOWN:
+ break;
+ }
+
+ /*
+ * Return free'd memory to the global counter when return threshold is
+ * met or process end.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
+ {
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global */
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart count */
+ allocation_return = 0;
+ }
+ }
+ }
+
+ return;
+}
+
+/* ----------
+ * pgstat_report_allocated_bytes_increase() -
+ * Called to report increase in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
+ int pg_allocator_type)
+{
+ /* Remove allocated memory from local allocation allowance */
+ allocation_allowance -= proc_allocated_bytes;
+
+ /* Increase pgstat count of allocated bytes */
+ *my_allocated_bytes += (proc_allocated_bytes);
+
+ /*
+ * Increase allocator type allocated bytes. NOTE: per hackers dsm memory
+ * allocations lifespan may exceed process lifespan, so we may implement a
+ * long lived tracker for it ala max_total_bkend_mem_bytes
+ */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ aset_allocated_bytes += (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_DSM:
+ dsm_allocated_bytes += (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_GENERATION:
+ generation_allocated_bytes += (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_SLAB:
+ slab_allocated_bytes += (proc_allocated_bytes);
+ break;
+ case PG_ALLOC_ONSHUTDOWN:
+ break;
+ }
return;
}
/* ---------
- * pgstat_zero_my_allocated_bytes() -
+ * pgstat_init_allocated_bytes() -
*
- * Called to zero out local allocated bytes variable after fork to avoid double
- * counting allocations.
+ * Called to initialize allocated bytes variables after fork and to
+ * avoid double counting allocations.
* ---------
*/
static inline void
-pgstat_zero_my_allocated_bytes(void)
+pgstat_init_allocated_bytes(void)
{
*my_allocated_bytes = 0;
+ /* If we're limiting backend memory */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+
+ allocation_return = 0;
+ aset_allocated_bytes = 0;
+ dsm_allocated_bytes = 0;
+ generation_allocated_bytes = 0;
+ slab_allocated_bytes = 0;
+ allocation_allowance = 0;
+
+ /* Account for the initial allocation allowance */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_allowance)
+ {
+ if (pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem -
+ initial_allocation_allowance))
+ {
+ /*
+ * On success populate allocation_allowance. Failure here will
+ * result in the backend's first invocation of
+ * exceeds_max_total_bkend_mem allocating requested, default,
+ * or available memory or result in an out of memory error.
+ */
+ allocation_allowance = initial_allocation_allowance;
+
+ break;
+ }
+ }
+ }
+
return;
}
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload
From 7a1d9cd82e72d3cccb356b8930e32b5154e14e00 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
palloc'd/pfree'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 15 ++++
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/autovacuum.c | 6 ++
src/backend/postmaster/postmaster.c | 13 ++++
src/backend/postmaster/syslogger.c | 3 +
src/backend/storage/ipc/dsm_impl.c | 81 +++++++++++++++++++++
src/backend/utils/activity/backend_status.c | 45 ++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 17 +++++
src/backend/utils/mmgr/generation.c | 15 ++++
src/backend/utils/mmgr/slab.c | 23 ++++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 63 +++++++++++++++-
src/test/regress/expected/rules.out | 9 ++-
src/test/regress/expected/stats.out | 11 +++
src/test/regress/sql/stats.sql | 5 +-
16 files changed, 307 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d0..c1c2eb3531 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -963,6 +963,21 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use <function>pg_size_pretty</function>
+ described in <xref linkend="functions-admin-dbsize"/> to make this value
+ more easily readable.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>query</structfield> <type>text</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 34ca0e739f..9544e50483 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -863,6 +863,7 @@ CREATE VIEW pg_stat_activity AS
S.state,
S.backend_xid,
s.backend_xmin,
+ S.allocated_bytes,
S.query_id,
S.query,
S.backend_type
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ff6149a179..59c9bcf8c4 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -407,6 +407,9 @@ StartAutoVacLauncher(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2552327d90..1f09781be8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4167,6 +4167,9 @@ BackendStartup(Port *port)
{
free(bn);
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* Detangle from postmaster */
InitPostmasterChild();
@@ -5374,6 +5377,11 @@ StartChildProcess(AuxProcType type)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
AuxiliaryProcessMain(type); /* does not return */
}
#endif /* EXEC_BACKEND */
@@ -5767,6 +5775,11 @@ do_start_bgworker(RegisteredBgWorker *rw)
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
+ /* Zero allocated bytes to avoid double counting parent allocation.
+ * Needs to be after the MemoryContextDelete(PostmasterContext) above.
+ */
+ pgstat_zero_my_allocated_bytes();
+
StartBackgroundWorker();
exit(1); /* should not get here */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 858a2f6b2b..9081ae140f 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -679,6 +679,9 @@ SysLogger_Start(void)
#ifndef EXEC_BACKEND
case 0:
+ /* Zero allocated bytes to avoid double counting parent allocation */
+ pgstat_zero_my_allocated_bytes();
+
/* in postmaster child ... */
InitPostmasterChild();
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index f0965c3481..22885c7bd2 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ {
+ pgstat_report_allocated_bytes(request_size - *mapped_size,
+ PG_ALLOC_INCREASE);
+ }
+#else
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +576,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +631,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +706,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +829,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(info.RegionSize, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +879,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes(*mapped_size, PG_ALLOC_DECREASE);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1007,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes(request_size, PG_ALLOC_INCREASE);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 608d01ea0d..7baf2db57d 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,9 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/* Memory allocated to this backend prior to pgstats initialization */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +403,15 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local_my_allocated_bytes to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes);
+
+ /* Populate sum of memory allocated prior to pgstats initialization to pgstats
+ * and zero the local variable.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -459,6 +471,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /*
+ * Stop reporting memory allocation changes to &MyBEEntry->allocated_bytes
+ */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1194,3 +1211,31 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * *allocated_bytes. *allocated_bytes needs to be valid until
+ * pgstat_set_allocated_bytes_storage() is called.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * my_allocated_bytes into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *new_allocated_bytes)
+{
+ my_allocated_bytes = new_allocated_bytes;
+ *new_allocated_bytes = local_my_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the location set up
+ * by pgstat_set_allocated_bytes_storage() becomes invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ my_allocated_bytes = &local_my_allocated_bytes;
+}
+
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index b61a12382b..35fab203d4 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -303,7 +303,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
Datum
pg_stat_get_activity(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_ACTIVITY_COLS 30
+#define PG_STAT_GET_ACTIVITY_COLS 31
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -359,6 +359,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
else
nulls[16] = true;
+ values[30] = UInt64GetDatum(beentry->allocated_bytes);
+
/* Values only available to role member or pg_read_all_stats */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 2589941ec4..1a2d86239c 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes(deallocation + context->mem_allocated, PG_ALLOC_DECREASE);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1041,6 +1055,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes(block->endptr - ((char *) block), PG_ALLOC_DECREASE);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1171,7 +1186,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes(oldblksize, PG_ALLOC_DECREASE);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index ebcb61e9b6..b06fb0c6a4 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes(firstBlockSize, PG_ALLOC_INCREASE);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes(context->mem_allocated, PG_ALLOC_DECREASE);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes(blksize, PG_ALLOC_INCREASE);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -729,6 +742,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes(block->blksize, PG_ALLOC_DECREASE);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 33dca0f37c..15d3380640 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -69,6 +69,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -413,6 +414,13 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add context header size to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock),
+ PG_ALLOC_INCREASE);
+
return (MemoryContext) slab;
}
@@ -429,6 +437,7 @@ SlabReset(MemoryContext context)
SlabContext *slab = (SlabContext *) context;
dlist_mutable_iter miter;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -449,6 +458,7 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
/* walk over blocklist and free the blocks */
@@ -465,9 +475,11 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes(deallocation, PG_ALLOC_DECREASE);
slab->curBlocklistIndex = 0;
Assert(context->mem_allocated == 0);
@@ -480,8 +492,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ /*
+ * Until context header allocation is included in context->mem_allocated,
+ * cast to slab and decrement the header allocation
+ */
+ pgstat_report_allocated_bytes(Slab_CONTEXT_HDRSZ(((SlabContext *)context)->chunksPerBlock),
+ PG_ALLOC_DECREASE);
+
/* And free the context header */
free(context);
}
@@ -546,6 +567,7 @@ SlabAlloc(MemoryContext context, Size size)
block->slab = slab;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_INCREASE);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -732,6 +754,7 @@ SlabFree(void *pointer)
#endif
free(block);
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes(slab->blockSize, PG_ALLOC_DECREASE);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 505595620e..cd3896869e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5404,9 +5404,9 @@
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
proretset => 't', provolatile => 's', proparallel => 'r',
prorettype => 'record', proargtypes => 'int4',
- proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
- proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
- proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+ proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,allocated_bytes}',
prosrc => 'pg_stat_get_activity' },
{ oid => '3318',
descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index f7bd83113a..754ff0dc62 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
#include "utils/backend_progress.h"
+#include "common/int.h"
/* ----------
@@ -32,6 +33,13 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocated bytes */
+enum allocation_direction
+{
+ PG_ALLOC_DECREASE = -1,
+ PG_ALLOC_IGNORE,
+ PG_ALLOC_INCREASE,
+};
/* ----------
* Shared-memory data structures
@@ -169,6 +177,9 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
} PgBackendStatus;
@@ -293,6 +304,7 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
/* ----------
@@ -324,7 +336,8 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -336,5 +349,53 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes() -
+ *
+ * Called to report change in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized. allocation_direction is a
+ * positive/negative multiplier enum defined above.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes(int64 allocated_bytes, int allocation_direction)
+{
+ uint64 temp;
+
+ /*
+ * Avoid *my_allocated_bytes unsigned integer overflow on
+ * PG_ALLOC_DECREASE
+ */
+ if (allocation_direction == PG_ALLOC_DECREASE &&
+ pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, &temp))
+ {
+ *my_allocated_bytes = 0;
+ ereport(LOG,
+ errmsg("Backend %d deallocated %lld bytes, exceeding the %llu bytes it is currently reporting allocated. Setting reported to 0.",
+ MyProcPid, (long long) allocated_bytes,
+ (unsigned long long) *my_allocated_bytes));
+ }
+ else
+ *my_allocated_bytes += (allocated_bytes) * allocation_direction;
+
+ return;
+}
+
+/* ---------
+ * pgstat_zero_my_allocated_bytes() -
+ *
+ * Called to zero out local allocated bytes variable after fork to avoid double
+ * counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_zero_my_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e953d1f515..271648619a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1756,10 +1756,11 @@ pg_stat_activity| SELECT s.datid,
s.state,
s.backend_xid,
s.backend_xmin,
+ s.allocated_bytes,
s.query_id,
s.query,
s.backend_type
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_all_indexes| SELECT c.oid AS relid,
@@ -1874,7 +1875,7 @@ pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
gss_princ AS principal,
gss_enc AS encrypted
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (client_port IS NOT NULL);
pg_stat_io| SELECT backend_type,
io_object,
@@ -2067,7 +2068,7 @@ pg_stat_replication| SELECT s.pid,
w.sync_priority,
w.sync_state,
w.reply_time
- FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
pg_stat_replication_slots| SELECT s.slot_name,
@@ -2101,7 +2102,7 @@ pg_stat_ssl| SELECT pid,
ssl_client_dn AS client_dn,
ssl_client_serial AS client_serial,
ssl_issuer_dn AS issuer_dn
- FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
+ FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id, allocated_bytes)
WHERE (client_port IS NOT NULL);
pg_stat_subscription| SELECT su.oid AS subid,
su.subname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 937b2101b3..2a848c02a1 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1354,4 +1354,15 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
t
(1 row)
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 74e592aa8a..568560c361 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -535,7 +535,6 @@ SET enable_seqscan TO on;
SELECT pg_stat_get_replication_slot(NULL);
SELECT pg_stat_get_subscription_stats(NULL);
-
-- Test that the following operations are tracked in pg_stat_io:
-- - reads of target blocks into shared buffers
-- - writes of shared buffers to permanent storage
@@ -678,4 +677,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
FROM pg_stat_io \gset
SELECT :io_stats_post_reset < :io_stats_pre_reset;
+-- ensure that allocated_bytes exist for backends
+SELECT allocated_bytes > 0 AS result FROM pg_stat_activity WHERE backend_type
+IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+
-- End of Stats Test
--
2.25.1
On 2023-03-02 14:41:26 -0500, reid.thompson@crunchydata.com wrote:
Patch has been rebased to master.
Quite a few prior review comments seem to not have been
addressed. There's not much point in posting new versions without that.
I think there's zero chance 0002 can make it into 16. If 0001 is cleaned
up, I can see a path.
Updated patches attached.
====================================================================
pg-stat-activity-backend-memory-allocated
====================================================================
DSM allocations created by a process and not destroyed prior to it's exit are
considered long lived and are tracked in global_dsm_allocated_bytes.
created 2 new system views (see below):
pg_stat_global_memory_allocation view displays datid, shared_memory_size,
shared_memory_size_in_huge_pages, global_dsm_allocated_bytes. shared_memory_size
and shared_memory_size_in_huge_pages display the calculated read only values for
these GUCs.
pg_stat_memory_allocation view
Migrated allocated_bytes out of pg_stat_activity view into this view.
pg_stat_memory_allocation also contains a breakdown of allocation by allocator
type (aset, dsm, generation, slab). View displays datid, pid, allocated_bytes,
aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes,
slab_allocated_bytes by process.
Reduced calls to initialize allocation counters by moving
intialization call into InitPostmasterChild.
postgres=# select * from pg_stat_global_memory_allocation;
datid | shared_memory_size | shared_memory_size_in_huge_pages | global_dsm_allocated_bytes
-------+--------------------+----------------------------------+----------------------------
5 | 192MB | 96 | 1048576
(1 row)
postgres=# select * from pg_stat_memory_allocation;
datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes
-------+--------+-----------------+----------------------+---------------------+----------------------------+----------------------
| 981842 | 771512 | 771512 | 0 | 0 | 0
| 981843 | 736696 | 736696 | 0 | 0 | 0
5 | 981913 | 4274792 | 4274792 | 0 | 0 | 0
| 981838 | 107216 | 107216 | 0 | 0 | 0
| 981837 | 123600 | 123600 | 0 | 0 | 0
| 981841 | 107216 | 107216 | 0 | 0 | 0
(6 rows)
postgres=# select ps.datid, ps.pid, state,application_name,backend_type, pa.* from pg_stat_activity ps join pg_stat_memory_allocation pa on (pa.pid = ps.pid) order by dsm_allocated_bytes, pa.pid;
datid | pid | state | application_name | backend_type | datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes
-------+--------+--------+------------------+------------------------------+-------+--------+-----------------+----------------------+---------------------+----------------------------+----------------------
| 981837 | | | checkpointer | | 981837 | 123600 | 123600 | 0 | 0 | 0
| 981838 | | | background writer | | 981838 | 107216 | 107216 | 0 | 0 | 0
| 981841 | | | walwriter | | 981841 | 107216 | 107216 | 0 | 0 | 0
| 981842 | | | autovacuum launcher | | 981842 | 771512 | 771512 | 0 | 0 | 0
| 981843 | | | logical replication launcher | | 981843 | 736696 | 736696 | 0 | 0 | 0
5 | 981913 | active | psql | client backend | 5 | 981913 | 5390864 | 5382824 | 0 | 8040 | 0
(6 rows)
====================================================================
dev-max-memory
====================================================================
Include shared_memory_size in max_total_backend_memory calculations.
max_total_backend_memory is reduced by shared_memory_size at startup.
Local allowance is refilled when consumed from global
max_total_bkend_mem_bytes_available.
pg_stat_global_memory_allocation view
add columns max_total_backend_memory_bytes, max_total_bkend_mem_bytes_available.
max_total_backend_memory_bytes displays a byte representation of
max_total_backend_memory. max_total_bkend_mem_bytes_available tracks the balance
of max_total_backend_memory_bytes available to backend processes.
postgres=# select * from pg_stat_global_memory_allocation;
datid | shared_memory_size | shared_memory_size_in_huge_pages | max_total_backend_memory_bytes | max_total_bkend_mem_bytes_available | global_dsm_allocated_bytes
-------+--------------------+----------------------------------+--------------------------------+-------------------------------------+----------------------------
5 | 192MB | 96 | 2147483648 | 1874633712 | 5242880
(1 row)
postgres=# select * from pg_stat_memory_allocation ;
datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes
-------+--------+-----------------+----------------------+---------------------+----------------------------+----------------------
| 534528 | 812472 | 812472 | 0 | 0 | 0
| 534529 | 736696 | 736696 | 0 | 0 | 0
5 | 556271 | 4458088 | 4458088 | 0 | 0 | 0
5 | 534942 | 1298680 | 1298680 | 0 | 0 | 0
5 | 709283 | 7985464 | 7985464 | 0 | 0 | 0
5 | 718693 | 8809240 | 8612504 | 196736 | 0 | 0
5 | 752113 | 25803192 | 25803192 | 0 | 0 | 0
5 | 659886 | 9042232 | 9042232 | 0 | 0 | 0
| 534525 | 2491088 | 2491088 | 0 | 0 | 0
| 534524 | 4465360 | 4465360 | 0 | 0 | 0
| 534527 | 107216 | 107216 | 0 | 0 | 0
(11 rows)
postgres=# select ps.datid, ps.pid, state,application_name,backend_type, pa.* from pg_stat_activity ps join pg_stat_memory_allocation pa on (pa.pid = ps.pid) order by dsm_allocated_bytes, pa.pid;
datid | pid | state | application_name | backend_type | datid | pid | allocated_bytes | aset_allocated_bytes | dsm_allocated_bytes | generation_allocated_bytes | slab_allocated_bytes
-------+--------+--------+------------------+------------------------------+-------+--------+-----------------+----------------------+---------------------+----------------------------+----------------------
| 534524 | | | checkpointer | | 534524 | 4465360 | 4465360 | 0 | 0 | 0
| 534525 | | | background writer | | 534525 | 2491088 | 2491088 | 0 | 0 | 0
| 534527 | | | walwriter | | 534527 | 107216 | 107216 | 0 | 0 | 0
| 534528 | | | autovacuum launcher | | 534528 | 812472 | 812472 | 0 | 0 | 0
| 534529 | | | logical replication launcher | | 534529 | 736696 | 736696 | 0 | 0 | 0
5 | 534942 | idle | psql | client backend | 5 | 534942 | 1298680 | 1298680 | 0 | 0 | 0
5 | 556271 | active | psql | client backend | 5 | 556271 | 4866576 | 4858536 | 0 | 8040 | 0
5 | 659886 | active | | autovacuum worker | 5 | 659886 | 8993080 | 8993080 | 0 | 0 | 0
5 | 709283 | active | | autovacuum worker | 5 | 709283 | 7928120 | 7928120 | 0 | 0 | 0
5 | 752113 | active | | autovacuum worker | 5 | 752113 | 27935608 | 27935608 | 0 | 0 | 0
5 | 718693 | active | psql | client backend | 5 | 718693 | 8669976 | 8473240 | 196736 | 0 | 0
(11 rows)
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 4dd47f04764b5df9c3962d9fdb4096398bf85dfd Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated tracking.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
exhaust max_total_backend_memory memory will be denied with an out of memory
error causing that backend's current query/transaction to fail. Further
requests will not be allocated until dropping below the limit. Keep this in
mind when setting this value. Due to the dynamic nature of memory allocations,
this limit is not exact. This limit does not affect auxiliary backend
processes. Backend memory allocations are displayed in the
pg_stat_memory_allocation and pg_stat_global_memory_allocation views.
---
doc/src/sgml/config.sgml | 28 +++
doc/src/sgml/monitoring.sgml | 48 ++++-
src/backend/catalog/system_views.sql | 6 +-
src/backend/storage/ipc/dsm_impl.c | 18 ++
src/backend/storage/lmgr/proc.c | 45 +++++
src/backend/utils/activity/backend_status.c | 173 ++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 16 +-
src/backend/utils/hash/dynahash.c | 3 +-
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 33 ++++
src/backend/utils/mmgr/generation.c | 16 ++
src/backend/utils/mmgr/slab.c | 16 +-
src/include/catalog/pg_proc.dat | 6 +-
src/include/storage/proc.h | 7 +
src/include/utils/backend_status.h | 87 ++++++++-
src/test/regress/expected/rules.out | 4 +-
17 files changed, 499 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 481f93cea1..9f37f6f070 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2113,6 +2113,34 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. At databse startup
+ max_total_backend_memory is reduced by shared_memory_size_mb
+ (shared buffers). Each backend process is intialized with a 1MB local
+ allowance which also reduces max_total_bkend_mem_bytes_available. Keep
+ this in mind when setting this value. A backend request that would
+ exhaust the limit will be denied with an out of memory error causing
+ that backend's current query/transaction to fail. Further requests will
+ not be allocated until dropping below the limit. This limit does not
+ affect auxiliary backend processes
+ <xref linkend="glossary-auxiliary-proc"/>. Backend memory allocations
+ (<varname>allocated_bytes</varname>) are displayed in the
+ <link linkend="monitoring-pg-stat-memory-allocation-view"><structname>pg_stat_memory_allocation</structname></link>
+ view. Due to the dynamic nature of memory allocations, this limit is
+ not exact.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d943821071..a67bd484f2 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5643,9 +5643,13 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<para>
The <structname>pg_stat_memory_allocation</structname> view will have one
row per server process, showing information related to the current memory
- allocation of that process. Use <function>pg_size_pretty</function>
- described in <xref linkend="functions-admin-dbsize"/> to make these values
- more easily readable.
+ allocation of that process in total and by allocator type. Dynamic shared
+ memory allocations are included only in the value displayed for the backend
+ that created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use
+ <function>pg_size_pretty</function> described in
+ <xref linkend="functions-admin-dbsize"/> to make these values more easily
+ readable.
</para>
<table id="pg-stat-memory-allocation-view" xreflabel="pg_stat_memory_allocation">
@@ -5687,10 +5691,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</para>
<para>
Memory currently allocated to this backend in bytes. This is the balance
- of bytes allocated and freed by this backend. Dynamic shared memory
- allocations are included only in the value displayed for the backend that
- created them, they are not included in the value for backends that are
- attached to them to avoid double counting.
+ of bytes allocated and freed by this backend.
</para></entry>
</row>
@@ -5803,6 +5804,39 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>max_total_backend_memory_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Reports the user defined backend maximum allowed shared memory in bytes.
+ 0 if disabled or not set. See
+ <xref linkend="guc-max-total-backend-memory"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>max_total_bkend_mem_bytes_available</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Tracks max_total_backend_memory (in bytes) available for allocation. At
+ database startup, max_total_bkend_mem_bytes_available is reduced by the
+ byte equivalent of shared_memory_size_mb. Each backend process is
+ intialized with a 1MB local allowance which also reduces
+ max_total_bkend_mem_bytes_available. A process's allocation requests
+ reduce it's local allowance. If a process's allocation request exceeds
+ it's remaining allowance, an attempt is made to refill the local
+ allowance from max_total_bkend_mem_bytes_available. If the refill request
+ fails, then the requesting process will fail with an out of memory error
+ resulting in the cancellation of that process's active query/transaction.
+ The default refill allocation quantity is 1MB. If a request is greater
+ than 1MB, an attempt will be made to allocate the full amount. If
+ max_total_backend_memory is disabled, this will be -1.
+ <xref linkend="guc-max-total-backend-memory"/>.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4bbd992311..86bde2a44c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1346,8 +1346,10 @@ CREATE VIEW pg_stat_memory_allocation AS
CREATE VIEW pg_stat_global_memory_allocation AS
SELECT
S.datid AS datid,
- current_setting('shared_memory_size'::text, true) AS shared_memory_size,
- (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ current_setting('shared_memory_size', true) as shared_memory_size,
+ (current_setting('shared_memory_size_in_huge_pages', true))::integer as shared_memory_size_in_huge_pages,
+ pg_size_bytes(current_setting('max_total_backend_memory', true)) as max_total_backend_memory_bytes,
+ S.max_total_bkend_mem_bytes_available,
S.global_dsm_allocated_bytes
FROM pg_stat_get_global_memory_allocation() AS S
LEFT JOIN pg_database AS D ON (S.datid = D.oid);
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 16e2bded59..68780de717 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,16 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ {
+ ereport(elevel,
+ (errcode_for_dynamic_shared_memory(),
+ errmsg("out of memory for segment \"%s\" - exceeds max_total_backend_memory: %m",
+ name)));
+ return false;
+ }
+
/*
* Create new segment or open an existing one for attach.
*
@@ -522,6 +532,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -716,6 +730,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d86fbdfd9b..80db49d775 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/guc.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -182,6 +183,50 @@ InitProcGlobal(void)
pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0);
+ /* Setup backend memory limiting if configured */
+ if (max_total_bkend_mem > 0)
+ {
+ /*
+ * Convert max_total_bkend_mem to bytes, account for shared_memory_size,
+ * and initialize max_total_bkend_mem_bytes.
+ */
+ int result = 0;
+
+ /* Get integer value of shared_memory_size */
+ if (parse_int(GetConfigOption("shared_memory_size", true, false), &result, 0, NULL))
+ {
+ /*
+ * Error on startup if backend memory limit is less than shared
+ * memory size. Warn on startup if backend memory available is less
+ * than arbitrarily picked value of 100MB.
+ */
+
+ if (max_total_bkend_mem - result <= 0)
+ {
+ ereport(ERROR,
+ errmsg("configured max_total_backend_memory %dMB is <= shared_memory_size %dMB",
+ max_total_bkend_mem, result),
+ errhint("Disable or increase the configuration parameter \"max_total_backend_memory\"."));
+ }
+ else if (max_total_bkend_mem - result <= 100)
+ {
+ ereport(WARNING,
+ errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <= 100MB",
+ max_total_bkend_mem, result),
+ errhint("Consider increasing the configuration parameter \"max_total_backend_memory\"."));
+ }
+
+ /*
+ * Account for shared memory size and initialize
+ * max_total_bkend_mem_bytes.
+ */
+ pg_atomic_init_u64(&ProcGlobal->max_total_bkend_mem_bytes,
+ max_total_bkend_mem * 1024 * 1024 - result * 1024 * 1024);
+ }
+ else
+ ereport(ERROR, errmsg("max_total_backend_memory initialization is unable to parse shared_memory_size"));
+ }
+
/*
* Create and initialize all the PGPROC structures we'll need. There are
* five separate consumers: (1) normal backends, (2) autovacuum workers
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index f921c4bbde..a4f9c6eb35 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,12 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/*
+ * Max backend memory allocation allowed (MB). 0 = disabled.
+ * Centralized bucket ProcGlobal->max_total_bkend_mem is initialized
+ * as a byte representation of this value in InitProcGlobal().
+ */
+int max_total_bkend_mem = 0;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -68,6 +74,31 @@ uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
uint64 local_my_slab_allocated_bytes = 0;
uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+/*
+ * Define initial allocation allowance for a backend.
+ *
+ * NOTE: initial_allocation_allowance && allocation_allowance_refill_qty
+ * may be candidates for future GUC variables. Arbitrary 1MB selected initially.
+ */
+uint64 initial_allocation_allowance = 1024 * 1024;
+uint64 allocation_allowance_refill_qty = 1024 * 1024;
+
+/*
+ * Local counter to manage shared memory allocations. At backend startup, set to
+ * initial_allocation_allowance via pgstat_init_allocated_bytes(). Decrease as
+ * memory is malloc'd. When exhausted, atomically refill if available from
+ * ProcGlobal->max_total_bkend_mem via exceeds_max_total_bkend_mem().
+ */
+uint64 allocation_allowance = 0;
+
+/*
+ * Local counter of free'd shared memory. Return to global
+ * max_total_bkend_mem when return threshold is met. Arbitrary 1MB bytes
+ * selected initially.
+ */
+uint64 allocation_return = 0;
+uint64 allocation_return_threshold = 1024 * 1024;
+
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
static char *BackendClientHostnameBuffer = NULL;
@@ -1271,6 +1302,8 @@ pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
my_slab_allocated_bytes = slab_allocated_bytes;
*slab_allocated_bytes = local_my_slab_allocated_bytes;
+
+ return;
}
/*
@@ -1294,6 +1327,23 @@ pgstat_reset_allocated_bytes_storage(void)
*my_dsm_allocated_bytes);
}
+ /*
+ * When limiting maximum backend memory, return this backend's memory
+ * allocations to global.
+ */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ *my_allocated_bytes + allocation_allowance +
+ allocation_return);
+
+ /* Reset memory allocation variables */
+ allocation_allowance = 0;
+ allocation_return = 0;
+ }
+
/* Reset memory allocation variables */
*my_allocated_bytes = local_my_allocated_bytes = 0;
*my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0;
@@ -1307,4 +1357,127 @@ pgstat_reset_allocated_bytes_storage(void)
my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+
+ return;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ * Refill allocation request bucket when needed/possible.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /*
+ * When limiting maximum backend memory, attempt to refill allocation
+ * request bucket if needed.
+ */
+ if (max_total_bkend_mem && allocation_request > allocation_allowance)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+ bool sts = false;
+
+ /*
+ * If allocation request is larger than memory refill quantity then
+ * attempt to increase allocation allowance with requested amount,
+ * otherwise fall through. If this refill fails we do not have enough
+ * memory to meet the request.
+ */
+ if (allocation_request >= allocation_allowance_refill_qty)
+ {
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_request)
+ {
+ if ((result = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_request)))
+ {
+ allocation_allowance = allocation_allowance + allocation_request;
+ break;
+ }
+ }
+
+ /*
+ * If the atomic exchange fails, we do not have enough reserve
+ * memory to meet the request. Negate result to return the proper
+ * value.
+ */
+ return !result;
+ }
+
+ /*
+ * Attempt to increase allocation allowance by memory refill quantity.
+ * If available memory is/becomes less than memory refill quantity,
+ * fall through to attempt to allocate remaining available memory.
+ */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_allowance_refill_qty)
+ {
+ if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_allowance_refill_qty)))
+ {
+ allocation_allowance = allocation_allowance + allocation_allowance_refill_qty;
+ break;
+ }
+ }
+
+ if (!sts)
+ {
+ /*
+ * If available_max_total_bkend_mem is 0, no memory is currently
+ * available to refill with, otherwise attempt to allocate
+ * remaining memory available if it exceeds the requested amount
+ * or the requested amount if more than requested amount gets
+ * returned while looping.
+ */
+ while ((available_max_total_bkend_mem = (int64) pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) > 0)
+ {
+ uint64 newval = 0;
+
+ /*
+ * If available memory is less than requested allocation we
+ * cannot fulfil request.
+ */
+ if (available_max_total_bkend_mem < allocation_request)
+ break;
+
+ /*
+ * If we happen to loop and a large chunk of memory has been
+ * returned to global, allocate request amount only.
+ */
+ if (available_max_total_bkend_mem > allocation_request)
+ newval = available_max_total_bkend_mem - allocation_request;
+
+ /* Allocate memory */
+ if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ newval)))
+ {
+ allocation_allowance = allocation_allowance +
+ newval == 0 ? available_max_total_bkend_mem : allocation_request;
+
+ break;
+ }
+ }
+ }
+
+ /*
+ * If refill is not successful, we return true, memory limit exceeded
+ */
+ if (!sts)
+ result = true;
+ }
+
+ /*
+ * Exclude auxiliary processes from the check. Return false. While we want
+ * to exclude them from the check, we do not want to exclude them from the
+ * above allocation handling.
+ */
+ if (MyAuxProcType != NotAnAuxProcess)
+ result = false;
+
+ return result;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index be973b1bdb..73cf3be4e3 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2128,7 +2128,7 @@ pg_stat_get_memory_allocation(PG_FUNCTION_ARGS)
Datum
pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2
+#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 3
TupleDesc tupdesc;
Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
@@ -2138,15 +2138,23 @@ pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid",
OIDOID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "max_total_bkend_mem_bytes_available",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "global_dsm_allocated_bytes",
INT8OID, -1, 0);
BlessTupleDesc(tupdesc);
/* datid */
values[0] = ObjectIdGetDatum(MyDatabaseId);
- /* get global_dsm_allocated_bytes */
- values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
+ /* Get max_total_bkend_mem_bytes - return -1 if disabled */
+ if (max_total_bkend_mem == 0)
+ values[1] = Int64GetDatum(-1);
+ else
+ values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes));
+
+ /* Get global_dsm_allocated_bytes */
+ values[2] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 012d4a0b1f..cd68e5265a 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -104,7 +104,6 @@
#include "utils/dynahash.h"
#include "utils/memutils.h"
-
/*
* Constants
*
@@ -359,7 +358,6 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
Assert(flags & HASH_ELEM);
Assert(info->keysize > 0);
Assert(info->entrysize >= info->keysize);
-
/*
* For shared hash tables, we have a local hash header (HTAB struct) that
* we allocate in TopMemoryContext; all else is in shared memory.
@@ -377,6 +375,7 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
}
else
{
+ /* Set up to allocate the hash header */
/* Create the hash table's private memory context */
if (flags & HASH_CONTEXT)
CurrentDynaHashCxt = info->hcxt;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1c0583fe26..639b63138b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3468,6 +3468,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f..bc2d449c87 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -156,6 +156,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index f3f5945fdf..4a83a2f60f 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,18 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ {
+ if (TopMemoryContext)
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -741,6 +753,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -938,6 +955,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1176,6 +1197,18 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /*
+ * Do not exceed maximum allowed memory allocation. NOTE: checking for
+ * the full size here rather than just the amount of increased
+ * allocation to prevent a potential underflow of *my_allocation
+ * allowance in cases where blksize - oldblksize does not trigger a
+ * refill but blksize is greater than *my_allocation_allowance.
+ * Underflow would occur with the call below to
+ * pgstat_report_allocated_bytes_increase()
+ */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 5708e8da7a..584b2ec8ef 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,16 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ {
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -380,6 +390,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +496,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 31814901f3..80e8b95071 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -356,9 +356,19 @@ SlabContextCreate(MemoryContext parent,
elog(ERROR, "block size %zu for slab is too small for %zu-byte chunks",
blockSize, chunkSize);
-
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock)))
+ {
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
+
if (slab == NULL)
{
MemoryContextStats(TopMemoryContext);
@@ -560,6 +570,10 @@ SlabAlloc(MemoryContext context, Size size)
}
else
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (unlikely(block == NULL))
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d6fbca4a1e..8937764a46 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5440,9 +5440,9 @@
descr => 'statistics: global memory allocation information',
proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f',
provolatile => 's', proparallel => 'r', prorettype => 'record',
- proargtypes => '', proallargtypes => '{oid,int8}',
- proargmodes => '{o,o}',
- proargnames => '{datid,global_dsm_allocated_bytes}',
+ proargtypes => '', proallargtypes => '{oid,int8,int8}',
+ proargmodes => '{o,o,o}',
+ proargnames => '{datid,max_total_bkend_mem_bytes_available,global_dsm_allocated_bytes}',
prosrc =>'pg_stat_get_global_memory_allocation' },
{ oid => '2022',
descr => 'statistics: information about currently active backends',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index c2c878219d..a2a5364a85 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -406,6 +406,13 @@ typedef struct PROC_HDR
int startupBufferPinWaitBufId;
/* Global dsm allocations */
pg_atomic_uint64 global_dsm_allocation;
+
+ /*
+ * Max backend memory allocation tracker. Used/Initialized when
+ * max_total_bkend_mem > 0 as max_total_bkend_mem (MB) converted to bytes.
+ * Decreases/increases with free/malloc of backend memory.
+ */
+ pg_atomic_uint64 max_total_bkend_mem_bytes;
} PROC_HDR;
extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 6434ece1ef..bca6fe10f3 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "libpq/pqcomm.h"
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
+#include "storage/proc.h"
#include "utils/backend_progress.h"
@@ -304,6 +305,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -316,6 +318,10 @@ extern PGDLLIMPORT uint64 *my_aset_allocated_bytes;
extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes;
extern PGDLLIMPORT uint64 *my_generation_allocated_bytes;
extern PGDLLIMPORT uint64 *my_slab_allocated_bytes;
+extern PGDLLIMPORT uint64 allocation_allowance;
+extern PGDLLIMPORT uint64 initial_allocation_allowance;
+extern PGDLLIMPORT uint64 allocation_return;
+extern PGDLLIMPORT uint64 allocation_return_threshold;
/* ----------
@@ -363,6 +369,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
* pgstat_report_allocated_bytes_decrease() -
@@ -384,6 +391,10 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
/* On overflow, set pgstat count of allocated bytes to zero */
*my_allocated_bytes = 0;
+ /* Add freed memory to allocation return counter. */
+ allocation_return += proc_allocated_bytes;
+
+ /* On overflow, set allocator type bytes to zero */
switch (pg_allocator_type)
{
case PG_ALLOC_ASET:
@@ -399,13 +410,35 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
*my_slab_allocated_bytes = 0;
break;
}
+
+ /*
+ * Return freed memory to the global counter if return threshold is
+ * met.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
+ {
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global tracker */
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart the count */
+ allocation_return = 0;
+ }
+ }
}
else
{
/* decrease allocation */
*my_allocated_bytes -= proc_allocated_bytes;
- /* Decrease allocator type allocated bytes. */
+ /* Add freed memory to allocation return counter */
+ allocation_return += proc_allocated_bytes;
+
+ /* Decrease allocator type allocated bytes */
switch (pg_allocator_type)
{
case PG_ALLOC_ASET:
@@ -427,6 +460,25 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
*my_slab_allocated_bytes -= proc_allocated_bytes;
break;
}
+
+ /*
+ * Return freed memory to the global counter if return threshold is
+ * met.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
+ {
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global tracker */
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart the count */
+ allocation_return = 0;
+ }
+ }
}
return;
@@ -444,6 +496,9 @@ static inline void
pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
int pg_allocator_type)
{
+ /* Remove allocated memory from local allocation allowance */
+ allocation_allowance -= proc_allocated_bytes;
+
*my_allocated_bytes += proc_allocated_bytes;
/* Increase allocator type allocated bytes */
@@ -488,6 +543,36 @@ pgstat_init_allocated_bytes(void)
*my_generation_allocated_bytes = 0;
*my_slab_allocated_bytes = 0;
+ /* If we're limiting backend memory */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+
+ allocation_return = 0;
+ allocation_allowance = 0;
+
+ /* Account for the initial allocation allowance */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= initial_allocation_allowance)
+ {
+ /*
+ * On success populate allocation_allowance. Failure here will
+ * result in the backend's first invocation of
+ * exceeds_max_total_bkend_mem allocating requested, default, or
+ * available memory or result in an out of memory error.
+ */
+ if (pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem -
+ initial_allocation_allowance))
+ {
+ allocation_allowance = initial_allocation_allowance;
+
+ break;
+ }
+ }
+ }
+
return;
}
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9cf035a74a..0edd7d387c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1874,8 +1874,10 @@ pg_stat_database_conflicts| SELECT oid AS datid,
pg_stat_global_memory_allocation| SELECT s.datid,
current_setting('shared_memory_size'::text, true) AS shared_memory_size,
(current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ pg_size_bytes(current_setting('max_total_backend_memory'::text, true)) AS max_total_backend_memory_bytes,
+ s.max_total_bkend_mem_bytes_available,
s.global_dsm_allocated_bytes
- FROM (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes)
+ FROM (pg_stat_get_global_memory_allocation() s(datid, max_total_bkend_mem_bytes_available, global_dsm_allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated.patchDownload
From 752d40bcefa66afc8c73976990d3d5943c35bf0d Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated
Add tracking of backend memory allocated in total and by allocation
type (aset, dsm, generation, slab) by process.
allocated_bytes tracks the current bytes of memory allocated to the
backend process. aset_allocated_bytes, dsm_allocated_bytes,
generation_allocated_bytes and slab_allocated_bytes track the
allocation by type for the backend process. They are updated for the
process as memory is malloc'd/freed. Memory allocated to items on
the freelist is included. Dynamic shared memory allocations are
included only in the value displayed for the backend that created
them, they are not included in the value for backends that are
attached to them to avoid double counting. DSM allocations that are
not destroyed by the creating process prior to it's exit are
considered long lived and are tracked in a global counter
global_dsm_allocated_bytes. We limit the floor of allocation
counters to zero. Created views pg_stat_global_memory_allocation and
pg_stat_memory_allocation for access to these trackers.
---
doc/src/sgml/monitoring.sgml | 188 ++++++++++++++++++++
src/backend/catalog/system_views.sql | 21 +++
src/backend/storage/ipc/dsm.c | 11 +-
src/backend/storage/ipc/dsm_impl.c | 78 ++++++++
src/backend/storage/lmgr/proc.c | 1 +
src/backend/utils/activity/backend_status.c | 114 ++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 84 +++++++++
src/backend/utils/init/miscinit.c | 3 +
src/backend/utils/mmgr/aset.c | 17 ++
src/backend/utils/mmgr/generation.c | 15 ++
src/backend/utils/mmgr/slab.c | 23 +++
src/include/catalog/pg_proc.dat | 17 ++
src/include/storage/proc.h | 2 +
src/include/utils/backend_status.h | 156 +++++++++++++++-
src/test/regress/expected/rules.out | 15 ++
src/test/regress/expected/stats.out | 36 ++++
src/test/regress/sql/stats.sql | 20 +++
17 files changed, 799 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 21e6ce2841..d943821071 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5633,6 +5633,194 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</sect2>
+ <sect2 id="monitoring-pg-stat-memory-allocation-view">
+ <title><structname>pg_stat_memory_allocation</structname></title>
+
+ <indexterm>
+ <primary>pg_stat_memory_allocation</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_stat_memory_allocation</structname> view will have one
+ row per server process, showing information related to the current memory
+ allocation of that process. Use <function>pg_size_pretty</function>
+ described in <xref linkend="functions-admin-dbsize"/> to make these values
+ more easily readable.
+ </para>
+
+ <table id="pg-stat-memory-allocation-view" xreflabel="pg_stat_memory_allocation">
+ <title><structname>pg_stat_memory_allocation</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of this backend
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>aset_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the allocation
+ set allocator.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the dynamic
+ shared memory allocator. Upon process exit, dsm allocations that have
+ not been freed are considered long lived and added to
+ <structfield>global_dsm_allocated_bytes</structfield> found in the
+ pg_stat_global_memory_allocation view. See
+ <xref linkend="monitoring-pg-stat-global-memory-allocation-view"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>generation_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the generation
+ allocator.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slab_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the slab
+ allocator.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
+ <sect2 id="monitoring-pg-stat-global-memory-allocation-view">
+ <title><structname>pg_stat_global_memory_allocation</structname></title>
+
+ <indexterm>
+ <primary>pg_stat_global-memory_allocation</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_stat_global_memory_allocation</structname> view will
+ have one row showing information related to current shared memory
+ allocations.
+ </para>
+
+ <table id="pg-stat-global-memory-allocation-view" xreflabel="pg_stat_global_memory_allocation">
+ <title><structname>pg_stat_global_memory_allocation</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>shared_memory_size_mb</structfield> <type>integer</type>
+ </para>
+ <para>
+ Reports the size of the main shared memory area, rounded up to the
+ nearest megabyte. See <xref linkend="guc-shared-memory-size"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>shared_memory_size_in_huge_pages</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Reports the number of huge pages that are needed for the main shared
+ memory area based on the specified huge_page_size. If huge pages are not
+ supported, this will be -1. See
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Long lived dynamically allocated memory currently allocated to the
+ database. Use <function>pg_size_pretty</function> described in
+ <xref linkend="functions-admin-dbsize"/> to make this value more easily
+ readable.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
<sect2 id="monitoring-stats-functions">
<title>Statistics Functions</title>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8ea159dbde..4bbd992311 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1330,3 +1330,24 @@ CREATE VIEW pg_stat_subscription_stats AS
ss.stats_reset
FROM pg_subscription as s,
pg_stat_get_subscription_stats(s.oid) as ss;
+
+CREATE VIEW pg_stat_memory_allocation AS
+ SELECT
+ S.datid AS datid,
+ S.pid,
+ S.allocated_bytes,
+ S.aset_allocated_bytes,
+ S.dsm_allocated_bytes,
+ S.generation_allocated_bytes,
+ S.slab_allocated_bytes
+ FROM pg_stat_get_memory_allocation(NULL) AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
+CREATE VIEW pg_stat_global_memory_allocation AS
+ SELECT
+ S.datid AS datid,
+ current_setting('shared_memory_size'::text, true) AS shared_memory_size,
+ (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ S.global_dsm_allocated_bytes
+ FROM pg_stat_get_global_memory_allocation() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 10b029bb16..64b1fecd1c 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -775,6 +775,15 @@ dsm_detach_all(void)
void
dsm_detach(dsm_segment *seg)
{
+ /*
+ * Retain mapped_size to pass into destroy call in cases where the detach
+ * is the last reference. mapped_size is zeroed as part of the detach
+ * process, but is needed later in these cases for dsm_allocated_bytes
+ * accounting.
+ */
+ Size local_seg_mapped_size = seg->mapped_size;
+ Size *ptr_local_seg_mapped_size = &local_seg_mapped_size;
+
/*
* Invoke registered callbacks. Just in case one of those callbacks
* throws a further error that brings us back here, pop the callback
@@ -855,7 +864,7 @@ dsm_detach(dsm_segment *seg)
*/
if (is_main_region_dsm_handle(seg->handle) ||
dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private,
- &seg->mapped_address, &seg->mapped_size, WARNING))
+ &seg->mapped_address, ptr_local_seg_mapped_size, WARNING))
{
LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
if (is_main_region_dsm_handle(seg->handle))
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index f0965c3481..16e2bded59 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,33 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ pgstat_report_allocated_bytes_increase(request_size - *mapped_size, PG_ALLOC_DSM);
+#else
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +573,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +628,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +703,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +826,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(info.RegionSize, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +876,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1004,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..d86fbdfd9b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -180,6 +180,7 @@ InitProcGlobal(void)
ProcGlobal->checkpointerLatch = NULL;
pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO);
pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
+ pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0);
/*
* Create and initialize all the PGPROC structures we'll need. There are
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 608d01ea0d..f921c4bbde 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,24 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/*
+ * Memory allocated to this backend prior to pgstats initialization. Migrated to
+ * shared memory on pgstats initialization.
+ */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
+
+/* Memory allocated to this backend by type prior to pgstats initialization.
+ * Migrated to shared memory on pgstats initialization
+ */
+uint64 local_my_aset_allocated_bytes = 0;
+uint64 *my_aset_allocated_bytes = &local_my_aset_allocated_bytes;
+uint64 local_my_dsm_allocated_bytes = 0;
+uint64 *my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
+uint64 local_my_generation_allocated_bytes = 0;
+uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
+uint64 local_my_slab_allocated_bytes = 0;
+uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +418,32 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local storage to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes,
+ &MyBEEntry->aset_allocated_bytes,
+ &MyBEEntry->dsm_allocated_bytes,
+ &MyBEEntry->generation_allocated_bytes,
+ &MyBEEntry->slab_allocated_bytes);
+
+ /*
+ * Populate sum of memory allocated prior to pgstats initialization to
+ * pgstats and zero the local variable. This is a += assignment because
+ * InitPostgres allocates memory after pgstat_beinit but prior to
+ * pgstat_bestart so we have allocations to both local and shared memory
+ * to combine.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+ lbeentry.aset_allocated_bytes += local_my_aset_allocated_bytes;
+ local_my_aset_allocated_bytes = 0;
+
+ lbeentry.dsm_allocated_bytes += local_my_dsm_allocated_bytes;
+ local_my_dsm_allocated_bytes = 0;
+ lbeentry.generation_allocated_bytes += local_my_generation_allocated_bytes;
+ local_my_generation_allocated_bytes = 0;
+ lbeentry.slab_allocated_bytes += local_my_slab_allocated_bytes;
+ local_my_slab_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -459,6 +503,9 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /* Stop reporting memory allocation changes to shared memory */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1194,3 +1241,70 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * shared memory.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * allocated bytes accounting into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
+ uint64 *aset_allocated_bytes,
+ uint64 *dsm_allocated_bytes,
+ uint64 *generation_allocated_bytes,
+ uint64 *slab_allocated_bytes)
+{
+ /* Map allocations to shared memory */
+ my_allocated_bytes = allocated_bytes;
+ *allocated_bytes = local_my_allocated_bytes;
+
+ my_aset_allocated_bytes = aset_allocated_bytes;
+ *aset_allocated_bytes = local_my_aset_allocated_bytes;
+
+ my_dsm_allocated_bytes = dsm_allocated_bytes;
+ *dsm_allocated_bytes = local_my_dsm_allocated_bytes;
+
+ my_generation_allocated_bytes = generation_allocated_bytes;
+ *generation_allocated_bytes = local_my_generation_allocated_bytes;
+
+ my_slab_allocated_bytes = slab_allocated_bytes;
+ *slab_allocated_bytes = local_my_slab_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the locations set up
+ * by pgstat_set_allocated_bytes_storage become invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /*
+ * Add dsm allocations that have not been freed to global dsm
+ * accounting
+ */
+ pg_atomic_add_fetch_u64(&procglobal->global_dsm_allocation,
+ *my_dsm_allocated_bytes);
+ }
+
+ /* Reset memory allocation variables */
+ *my_allocated_bytes = local_my_allocated_bytes = 0;
+ *my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = local_my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = local_my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = local_my_slab_allocated_bytes = 0;
+
+ /* Point my_{*_}allocated_bytes from shared memory back to local */
+ my_allocated_bytes = &local_my_allocated_bytes;
+ my_aset_allocated_bytes = &local_my_aset_allocated_bytes;
+ my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
+ my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
+ my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 56119737c8..be973b1bdb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2067,3 +2067,87 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
}
+
+/*
+ * Get the memory allocation of PG backends.
+ */
+Datum
+pg_stat_get_memory_allocation(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_MEMORY_ALLOCATION_COLS 7
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ /* for each row */
+ Datum values[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0};
+ bool nulls[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0};
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_fetch_stat_local_beentry(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ /* If looking for specific PID, ignore all the others */
+ if (pid != -1 && beentry->st_procpid != pid)
+ continue;
+
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = Int32GetDatum(beentry->st_procpid);
+ values[2] = UInt64GetDatum(beentry->allocated_bytes);
+ values[3] = UInt64GetDatum(beentry->aset_allocated_bytes);
+ values[4] = UInt64GetDatum(beentry->dsm_allocated_bytes);
+ values[5] = UInt64GetDatum(beentry->generation_allocated_bytes);
+ values[6] = UInt64GetDatum(beentry->slab_allocated_bytes);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ /* If only a single backend was requested, and we found it, break. */
+ if (pid != -1)
+ break;
+ }
+
+ return (Datum) 0;
+}
+
+/*
+ * Get the global memory allocation statistics.
+ */
+Datum
+pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2
+ TupleDesc tupdesc;
+ Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
+ bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Initialise attributes information in the tuple descriptor */
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid",
+ OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes",
+ INT8OID, -1, 0);
+ BlessTupleDesc(tupdesc);
+
+ /* datid */
+ values[0] = ObjectIdGetDatum(MyDatabaseId);
+
+ /* get global_dsm_allocated_bytes */
+ values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
+
+ /* Returns the record as Datum */
+ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a604432126..7b8eeb7dbb 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -171,6 +171,9 @@ InitPostmasterChild(void)
(errcode_for_socket_access(),
errmsg_internal("could not set postmaster death monitoring pipe to FD_CLOEXEC mode: %m")));
#endif
+
+ /* Init allocated bytes to avoid double counting parent allocation */
+ pgstat_init_allocated_bytes();
}
/*
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 2589941ec4..f3f5945fdf 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_ASET);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes_decrease(deallocation + context->mem_allocated, PG_ALLOC_ASET);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1041,6 +1055,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes_decrease(block->endptr - ((char *) block), PG_ALLOC_ASET);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1171,7 +1186,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes_decrease(oldblksize, PG_ALLOC_ASET);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index ebcb61e9b6..5708e8da7a 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_GENERATION);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_GENERATION);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes_decrease(context->mem_allocated, PG_ALLOC_GENERATION);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -729,6 +742,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes_decrease(block->blksize, PG_ALLOC_GENERATION);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 33dca0f37c..31814901f3 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -69,6 +69,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -413,6 +414,13 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add context header size to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes_increase(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock),
+ PG_ALLOC_SLAB);
+
return (MemoryContext) slab;
}
@@ -429,6 +437,7 @@ SlabReset(MemoryContext context)
SlabContext *slab = (SlabContext *) context;
dlist_mutable_iter miter;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -449,6 +458,7 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
/* walk over blocklist and free the blocks */
@@ -465,9 +475,11 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_SLAB);
slab->curBlocklistIndex = 0;
Assert(context->mem_allocated == 0);
@@ -480,8 +492,17 @@ SlabReset(MemoryContext context)
void
SlabDelete(MemoryContext context)
{
+
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ /*
+ * Until context header allocation is included in context->mem_allocated,
+ * cast to slab and decrement the header allocation
+ */
+ pgstat_report_allocated_bytes_decrease(Slab_CONTEXT_HDRSZ(((SlabContext *) context)->chunksPerBlock),
+ PG_ALLOC_SLAB);
+
/* And free the context header */
free(context);
}
@@ -546,6 +567,7 @@ SlabAlloc(MemoryContext context, Size size)
block->slab = slab;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes_increase(slab->blockSize, PG_ALLOC_SLAB);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -732,6 +754,7 @@ SlabFree(void *pointer)
#endif
free(block);
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes_decrease(slab->blockSize, PG_ALLOC_SLAB);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c358cff16..d6fbca4a1e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5427,6 +5427,23 @@
proname => 'pg_stat_get_backend_idset', prorows => '100', proretset => 't',
provolatile => 's', proparallel => 'r', prorettype => 'int4',
proargtypes => '', prosrc => 'pg_stat_get_backend_idset' },
+{ oid => '9890',
+ descr => 'statistics: memory allocation information for backends',
+ proname => 'pg_stat_get_memory_allocation', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,oid,int4,int8,int8,int8,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,allocated_bytes,aset_allocated_bytes,dsm_allocated_bytes,generation_allocated_bytes,slab_allocated_bytes}',
+ prosrc => 'pg_stat_get_memory_allocation' },
+{ oid => '9891',
+ descr => 'statistics: global memory allocation information',
+ proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f',
+ provolatile => 's', proparallel => 'r', prorettype => 'record',
+ proargtypes => '', proallargtypes => '{oid,int8}',
+ proargmodes => '{o,o}',
+ proargnames => '{datid,global_dsm_allocated_bytes}',
+ prosrc =>'pg_stat_get_global_memory_allocation' },
{ oid => '2022',
descr => 'statistics: information about currently active backends',
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 4258cd92c9..c2c878219d 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -404,6 +404,8 @@ typedef struct PROC_HDR
int spins_per_delay;
/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
int startupBufferPinWaitBufId;
+ /* Global dsm allocations */
+ pg_atomic_uint64 global_dsm_allocation;
} PROC_HDR;
extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index f7bd83113a..6434ece1ef 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -10,6 +10,7 @@
#ifndef BACKEND_STATUS_H
#define BACKEND_STATUS_H
+#include "common/int.h"
#include "datatype/timestamp.h"
#include "libpq/pqcomm.h"
#include "miscadmin.h" /* for BackendType */
@@ -32,6 +33,14 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocator type */
+enum pg_allocator_type
+{
+ PG_ALLOC_ASET = 1,
+ PG_ALLOC_DSM,
+ PG_ALLOC_GENERATION,
+ PG_ALLOC_SLAB
+};
/* ----------
* Shared-memory data structures
@@ -169,6 +178,15 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
+
+ /* Current memory allocated to this backend by type */
+ uint64 aset_allocated_bytes;
+ uint64 dsm_allocated_bytes;
+ uint64 generation_allocated_bytes;
+ uint64 slab_allocated_bytes;
} PgBackendStatus;
@@ -293,6 +311,11 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_aset_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_generation_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_slab_allocated_bytes;
/* ----------
@@ -324,7 +347,12 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
+ uint64 *aset_allocated_bytes,
+ uint64 *dsm_allocated_bytes,
+ uint64 *generation_allocated_bytes,
+ uint64 *slab_allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -336,5 +364,131 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes_decrease() -
+ * Called to report decrease in memory allocated for this backend.
+ *
+ * my_{*_}allocated_bytes initially points to local memory, making it safe to
+ * call this before pgstats has been initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
+ int pg_allocator_type)
+{
+ uint64 temp;
+
+ /* Avoid allocated_bytes unsigned integer overflow on decrease */
+ if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp))
+ {
+ /* On overflow, set pgstat count of allocated bytes to zero */
+ *my_allocated_bytes = 0;
+
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_DSM:
+ *my_dsm_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes = 0;
+ break;
+ }
+ }
+ else
+ {
+ /* decrease allocation */
+ *my_allocated_bytes -= proc_allocated_bytes;
+
+ /* Decrease allocator type allocated bytes. */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_DSM:
+
+ /*
+ * Some dsm allocations live beyond process exit. These are
+ * accounted for in a global counter in
+ * pgstat_reset_allocated_bytes_storage at process exit.
+ */
+ *my_dsm_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes -= proc_allocated_bytes;
+ break;
+ }
+ }
+
+ return;
+}
+
+/* ----------
+ * pgstat_report_allocated_bytes_increase() -
+ * Called to report increase in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
+ int pg_allocator_type)
+{
+ *my_allocated_bytes += proc_allocated_bytes;
+
+ /* Increase allocator type allocated bytes */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_DSM:
+
+ /*
+ * Some dsm allocations live beyond process exit. These are
+ * accounted for in a global counter in
+ * pgstat_reset_allocated_bytes_storage at process exit.
+ */
+ *my_dsm_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes += proc_allocated_bytes;
+ break;
+ }
+
+ return;
+}
+
+/* ---------
+ * pgstat_init_allocated_bytes() -
+ *
+ * Called to initialize allocated bytes variables after fork and to
+ * avoid double counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_init_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+ *my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 996d22b7dd..9cf035a74a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1871,6 +1871,12 @@ pg_stat_database_conflicts| SELECT oid AS datid,
pg_stat_get_db_conflict_bufferpin(oid) AS confl_bufferpin,
pg_stat_get_db_conflict_startup_deadlock(oid) AS confl_deadlock
FROM pg_database d;
+pg_stat_global_memory_allocation| SELECT s.datid,
+ current_setting('shared_memory_size'::text, true) AS shared_memory_size,
+ (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ s.global_dsm_allocated_bytes
+ FROM (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
gss_princ AS principal,
@@ -1889,6 +1895,15 @@ pg_stat_io| SELECT backend_type,
fsyncs,
stats_reset
FROM pg_stat_get_io() b(backend_type, io_object, io_context, reads, writes, extends, op_bytes, evictions, reuses, fsyncs, stats_reset);
+pg_stat_memory_allocation| SELECT s.datid,
+ s.pid,
+ s.allocated_bytes,
+ s.aset_allocated_bytes,
+ s.dsm_allocated_bytes,
+ s.generation_allocated_bytes,
+ s.slab_allocated_bytes
+ FROM (pg_stat_get_memory_allocation(NULL::integer) s(datid, pid, allocated_bytes, aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes, slab_allocated_bytes)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 55b4c6df01..5fad38d49d 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1469,4 +1469,40 @@ SELECT COUNT(*) FROM brin_hot_3 WHERE a = 2;
DROP TABLE brin_hot_3;
SET enable_seqscan = on;
+-- ensure that allocated_bytes exist for backends
+SELECT
+ allocated_bytes > 0 AS result
+FROM
+ pg_stat_activity ps
+ JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid)
+WHERE
+ backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
+-- ensure that pg_stat_global_memory_allocation view exists
+SELECT
+ datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0
+FROM
+ pg_stat_global_memory_allocation;
+ ?column? | ?column? | ?column? | ?column?
+----------+----------+----------+----------
+ t | t | t | t
+(1 row)
+
+-- ensure that pg_stat_memory_allocation view exists
+SELECT
+ pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0
+FROM
+ pg_stat_memory_allocation limit 1;
+ ?column? | ?column? | ?column? | ?column? | ?column? | ?column?
+----------+----------+----------+----------+----------+----------
+ t | t | t | t | t | t
+(1 row)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d958e70a86..e768f3df84 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -763,4 +763,24 @@ DROP TABLE brin_hot_3;
SET enable_seqscan = on;
+-- ensure that allocated_bytes exist for backends
+SELECT
+ allocated_bytes > 0 AS result
+FROM
+ pg_stat_activity ps
+ JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid)
+WHERE
+ backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+
+-- ensure that pg_stat_global_memory_allocation view exists
+SELECT
+ datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0
+FROM
+ pg_stat_global_memory_allocation;
+
+-- ensure that pg_stat_memory_allocation view exists
+SELECT
+ pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0
+FROM
+ pg_stat_memory_allocation limit 1;
-- End of Stats Test
--
2.25.1
Updated patches attached.
Rebased to current master.
Added additional columns to pg_stat_global_memory_allocation to summarize backend allocations by type.
Updated documentation.
Corrected some issues noted in review by John Morris.
Added code re EXEC_BACKEND for dev-max-memory branch.
Attachments:
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From 34514ae2bebe5e3ab2a0b5b680d3932b5e7706ee Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated tracking.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
exhaust max_total_backend_memory memory will be denied with an out of memory
error causing that backend's current query/transaction to fail. Further
requests will not be allocated until dropping below the limit. Keep this in
mind when setting this value. Due to the dynamic nature of memory allocations,
this limit is not exact. This limit does not affect auxiliary backend
processes. Backend memory allocations are displayed in the
pg_stat_memory_allocation and pg_stat_global_memory_allocation views.
---
doc/src/sgml/config.sgml | 30 +++
doc/src/sgml/monitoring.sgml | 38 +++-
src/backend/catalog/system_views.sql | 2 +
src/backend/port/sysv_shmem.c | 9 +
src/backend/postmaster/postmaster.c | 5 +
src/backend/storage/ipc/dsm_impl.c | 18 ++
src/backend/storage/lmgr/proc.c | 45 +++++
src/backend/utils/activity/backend_status.c | 183 ++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 16 +-
src/backend/utils/hash/dynahash.c | 3 +-
src/backend/utils/init/miscinit.c | 8 +
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 33 ++++
src/backend/utils/mmgr/generation.c | 16 ++
src/backend/utils/mmgr/slab.c | 15 +-
src/include/catalog/pg_proc.dat | 6 +-
src/include/storage/proc.h | 7 +
src/include/utils/backend_status.h | 120 ++++++++++--
src/test/regress/expected/rules.out | 4 +-
20 files changed, 537 insertions(+), 35 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bcc49aec45..4c735e180f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2133,6 +2133,36 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. At databse startup
+ max_total_backend_memory is reduced by shared_memory_size_mb
+ (includes shared buffers and other memory required for initialization).
+ Each backend process is intialized with a 1MB local allowance which
+ also reduces max_total_bkend_mem_bytes_available. Keep this in mind
+ when setting this value. A backend request that would exhaust the limit
+ will be denied with an out of memory error causing that backend's
+ current query/transaction to fail. Further requests will not be
+ allocated until dropping below the limit. This limit does not affect
+ auxiliary backend processes
+ <xref linkend="glossary-auxiliary-proc"/> or the postmaster process.
+ Backend memory allocations (<varname>allocated_bytes</varname>) are
+ displayed in the
+ <link linkend="monitoring-pg-stat-memory-allocation-view"><structname>pg_stat_memory_allocation</structname></link>
+ view. Due to the dynamic nature of memory allocations, this limit is
+ not exact.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 70b3441412..704a75bd6e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5704,10 +5704,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</para>
<para>
Memory currently allocated to this backend in bytes. This is the balance
- of bytes allocated and freed by this backend. Dynamic shared memory
- allocations are included only in the value displayed for the backend that
- created them, they are not included in the value for backends that are
- attached to them to avoid double counting.
+ of bytes allocated and freed by this backend.
</para></entry>
</row>
@@ -5824,6 +5821,39 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>max_total_backend_memory_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Reports the user defined backend maximum allowed shared memory in bytes.
+ 0 if disabled or not set. See
+ <xref linkend="guc-max-total-backend-memory"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>max_total_bkend_mem_bytes_available</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Tracks max_total_backend_memory (in bytes) available for allocation. At
+ database startup, max_total_bkend_mem_bytes_available is reduced by the
+ byte equivalent of shared_memory_size_mb. Each backend process is
+ intialized with a 1MB local allowance which also reduces
+ max_total_bkend_mem_bytes_available. A process's allocation requests
+ reduce it's local allowance. If a process's allocation request exceeds
+ it's remaining allowance, an attempt is made to refill the local
+ allowance from max_total_bkend_mem_bytes_available. If the refill request
+ fails, then the requesting process will fail with an out of memory error
+ resulting in the cancellation of that process's active query/transaction.
+ The default refill allocation quantity is 1MB. If a request is greater
+ than 1MB, an attempt will be made to allocate the full amount. If
+ max_total_backend_memory is disabled, this will be -1.
+ <xref linkend="guc-max-total-backend-memory"/>.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6876564904..8108d3467f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1359,6 +1359,8 @@ SELECT
S.datid AS datid,
current_setting('shared_memory_size', true) as shared_memory_size,
(current_setting('shared_memory_size_in_huge_pages', true))::integer as shared_memory_size_in_huge_pages,
+ pg_size_bytes(current_setting('max_total_backend_memory', true)) as max_total_backend_memory_bytes,
+ S.max_total_bkend_mem_bytes_available,
S.global_dsm_allocated_bytes,
sums.total_aset_allocated_bytes,
sums.total_dsm_allocated_bytes,
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index eaba244bc9..463bf2e90f 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -34,6 +34,7 @@
#include "storage/fd.h"
#include "storage/ipc.h"
#include "storage/pg_shmem.h"
+#include "utils/backend_status.h"
#include "utils/guc_hooks.h"
#include "utils/pidfile.h"
@@ -903,6 +904,14 @@ PGSharedMemoryReAttach(void)
dsm_set_control_handle(hdr->dsm_control);
UsedShmemSegAddr = hdr; /* probably redundant */
+
+ /*
+ * Init allocated bytes to avoid double counting parent allocation for
+ * fork/exec processes. Forked processes perform this action in
+ * InitPostmasterChild. For EXEC_BACKEND processes we have to wait for
+ * shared memory to be reattached.
+ */
+ pgstat_init_allocated_bytes();
}
/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 4c49393fc5..06a773c8bb 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -540,6 +540,7 @@ typedef struct
#endif
char my_exec_path[MAXPGPATH];
char pkglib_path[MAXPGPATH];
+ int max_total_bkend_mem;
} BackendParameters;
static void read_backend_variables(char *id, Port *port);
@@ -6122,6 +6123,8 @@ save_backend_variables(BackendParameters *param, Port *port,
strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
+ param->max_total_bkend_mem = max_total_bkend_mem;
+
return true;
}
@@ -6352,6 +6355,8 @@ restore_backend_variables(BackendParameters *param, Port *port)
strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
+ max_total_bkend_mem = param->max_total_bkend_mem;
+
/*
* We need to restore fd.c's counts of externally-opened FDs; to avoid
* confusion, be sure to do this after restoring max_safe_fds. (Note:
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 16e2bded59..68780de717 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,16 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ {
+ ereport(elevel,
+ (errcode_for_dynamic_shared_memory(),
+ errmsg("out of memory for segment \"%s\" - exceeds max_total_backend_memory: %m",
+ name)));
+ return false;
+ }
+
/*
* Create new segment or open an existing one for attach.
*
@@ -522,6 +532,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -716,6 +730,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d86fbdfd9b..cee66af8f0 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/guc.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -182,6 +183,50 @@ InitProcGlobal(void)
pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0);
+ /* Setup backend memory limiting if configured */
+ if (max_total_bkend_mem > 0)
+ {
+ /*
+ * Convert max_total_bkend_mem to bytes, account for
+ * shared_memory_size, and initialize max_total_bkend_mem_bytes.
+ */
+ int result = 0;
+
+ /* Get integer value of shared_memory_size */
+ if (parse_int(GetConfigOption("shared_memory_size", true, false), &result, 0, NULL))
+ {
+ /*
+ * Error on startup if backend memory limit is less than shared
+ * memory size. Warn on startup if backend memory available is
+ * less than arbitrarily picked value of 100MB.
+ */
+
+ if (max_total_bkend_mem - result <= 0)
+ {
+ ereport(ERROR,
+ errmsg("configured max_total_backend_memory %dMB is <= shared_memory_size %dMB",
+ max_total_bkend_mem, result),
+ errhint("Disable or increase the configuration parameter \"max_total_backend_memory\"."));
+ }
+ else if (max_total_bkend_mem - result <= 100)
+ {
+ ereport(WARNING,
+ errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <= 100MB",
+ max_total_bkend_mem, result),
+ errhint("Consider increasing the configuration parameter \"max_total_backend_memory\"."));
+ }
+
+ /*
+ * Account for shared memory size and initialize
+ * max_total_bkend_mem_bytes.
+ */
+ pg_atomic_init_u64(&ProcGlobal->max_total_bkend_mem_bytes,
+ (uint64) max_total_bkend_mem * 1024 * 1024 - (uint64) result * 1024 * 1024);
+ }
+ else
+ ereport(ERROR, errmsg("max_total_backend_memory initialization is unable to parse shared_memory_size"));
+ }
+
/*
* Create and initialize all the PGPROC structures we'll need. There are
* five separate consumers: (1) normal backends, (2) autovacuum workers
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index f921c4bbde..4103cbedda 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,12 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/*
+ * Max backend memory allocation allowed (MB). 0 = disabled.
+ * Centralized bucket ProcGlobal->max_total_bkend_mem is initialized
+ * as a byte representation of this value in InitProcGlobal().
+ */
+int max_total_bkend_mem = 0;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -68,6 +74,31 @@ uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
uint64 local_my_slab_allocated_bytes = 0;
uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+/*
+ * Define initial allocation allowance for a backend.
+ *
+ * NOTE: initial_allocation_allowance && allocation_allowance_refill_qty
+ * may be candidates for future GUC variables. Arbitrary 1MB selected initially.
+ */
+uint64 initial_allocation_allowance = 1024 * 1024;
+uint64 allocation_allowance_refill_qty = 1024 * 1024;
+
+/*
+ * Local counter to manage shared memory allocations. At backend startup, set to
+ * initial_allocation_allowance via pgstat_init_allocated_bytes(). Decrease as
+ * memory is malloc'd. When exhausted, atomically refill if available from
+ * ProcGlobal->max_total_bkend_mem via exceeds_max_total_bkend_mem().
+ */
+uint64 allocation_allowance = 0;
+
+/*
+ * Local counter of free'd shared memory. Return to global
+ * max_total_bkend_mem when return threshold is met. Arbitrary 1MB bytes
+ * selected initially.
+ */
+uint64 allocation_return = 0;
+uint64 allocation_return_threshold = 1024 * 1024;
+
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
static char *BackendClientHostnameBuffer = NULL;
@@ -1271,6 +1302,8 @@ pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
my_slab_allocated_bytes = slab_allocated_bytes;
*slab_allocated_bytes = local_my_slab_allocated_bytes;
+
+ return;
}
/*
@@ -1294,6 +1327,23 @@ pgstat_reset_allocated_bytes_storage(void)
*my_dsm_allocated_bytes);
}
+ /*
+ * When limiting maximum backend memory, return this backend's memory
+ * allocations to global.
+ */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ *my_allocated_bytes + allocation_allowance +
+ allocation_return);
+
+ /* Reset memory allocation variables */
+ allocation_allowance = 0;
+ allocation_return = 0;
+ }
+
/* Reset memory allocation variables */
*my_allocated_bytes = local_my_allocated_bytes = 0;
*my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0;
@@ -1307,4 +1357,137 @@ pgstat_reset_allocated_bytes_storage(void)
my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+
+ return;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ * Refill allocation request bucket when needed/possible.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /*
+ * When limiting maximum backend memory, attempt to refill allocation
+ * request bucket if needed.
+ */
+ if (max_total_bkend_mem && allocation_request > allocation_allowance &&
+ ProcGlobal != NULL)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+ bool sts = false;
+
+ /*
+ * If allocation request is larger than memory refill quantity then
+ * attempt to increase allocation allowance with requested amount,
+ * otherwise fall through. If this refill fails we do not have enough
+ * memory to meet the request.
+ */
+ if (allocation_request >= allocation_allowance_refill_qty)
+ {
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_request)
+ {
+ if ((result = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_request)))
+ {
+ allocation_allowance = allocation_allowance + allocation_request;
+ break;
+ }
+ }
+
+ /*
+ * Exclude auxiliary and Postmaster processes from the check.
+ * Return false. While we want to exclude them from the check, we
+ * do not want to exclude them from the above allocation handling.
+ */
+ if (MyAuxProcType != NotAnAuxProcess || MyProcPid == PostmasterPid)
+ return false;
+
+ /*
+ * If the atomic exchange fails (result == false), we do not have
+ * enough reserve memory to meet the request. Negate result to
+ * return the proper value.
+ */
+
+ return !result;
+ }
+
+ /*
+ * Attempt to increase allocation allowance by memory refill quantity.
+ * If available memory is/becomes less than memory refill quantity,
+ * fall through to attempt to allocate remaining available memory.
+ */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= allocation_allowance_refill_qty)
+ {
+ if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_allowance_refill_qty)))
+ {
+ allocation_allowance = allocation_allowance + allocation_allowance_refill_qty;
+ break;
+ }
+ }
+
+ if (!sts)
+ {
+ /*
+ * If available_max_total_bkend_mem is 0, no memory is currently
+ * available to refill with, otherwise attempt to allocate
+ * remaining memory available if it exceeds the requested amount
+ * or the requested amount if more than requested amount gets
+ * returned while looping.
+ */
+ while ((available_max_total_bkend_mem = (int64) pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) > 0)
+ {
+ uint64 newval = 0;
+
+ /*
+ * If available memory is less than requested allocation we
+ * cannot fulfil request.
+ */
+ if (available_max_total_bkend_mem < allocation_request)
+ break;
+
+ /*
+ * If we happen to loop and a large chunk of memory has been
+ * returned to global, allocate request amount only.
+ */
+ if (available_max_total_bkend_mem > allocation_request)
+ newval = available_max_total_bkend_mem - allocation_request;
+
+ /* Allocate memory */
+ if ((sts = pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ newval)))
+ {
+ allocation_allowance = allocation_allowance +
+ newval == 0 ? available_max_total_bkend_mem : allocation_request;
+
+ break;
+ }
+ }
+ }
+
+ /*
+ * If refill is not successful, we return true, memory limit exceeded
+ */
+ if (!sts)
+ result = true;
+ }
+
+ /*
+ * Exclude auxiliary and postmaster processes from the check. Return false.
+ * While we want to exclude them from the check, we do not want to exclude
+ * them from the above allocation handling.
+ */
+ if (MyAuxProcType != NotAnAuxProcess || MyProcPid == PostmasterPid)
+ result = false;
+
+ return result;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a5fd5e6964..70c4a0b2bd 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2056,7 +2056,7 @@ pg_stat_get_memory_allocation(PG_FUNCTION_ARGS)
Datum
pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2
+#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 3
TupleDesc tupdesc;
Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
@@ -2066,15 +2066,23 @@ pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid",
OIDOID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "max_total_bkend_mem_bytes_available",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "global_dsm_allocated_bytes",
INT8OID, -1, 0);
BlessTupleDesc(tupdesc);
/* datid */
values[0] = ObjectIdGetDatum(MyDatabaseId);
- /* get global_dsm_allocated_bytes */
- values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
+ /* Get max_total_bkend_mem_bytes - return -1 if disabled */
+ if (max_total_bkend_mem == 0)
+ values[1] = Int64GetDatum(-1);
+ else
+ values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes));
+
+ /* Get global_dsm_allocated_bytes */
+ values[2] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 012d4a0b1f..cd68e5265a 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -104,7 +104,6 @@
#include "utils/dynahash.h"
#include "utils/memutils.h"
-
/*
* Constants
*
@@ -359,7 +358,6 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
Assert(flags & HASH_ELEM);
Assert(info->keysize > 0);
Assert(info->entrysize >= info->keysize);
-
/*
* For shared hash tables, we have a local hash header (HTAB struct) that
* we allocate in TopMemoryContext; all else is in shared memory.
@@ -377,6 +375,7 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
}
else
{
+ /* Set up to allocate the hash header */
/* Create the hash table's private memory context */
if (flags & HASH_CONTEXT)
CurrentDynaHashCxt = info->hcxt;
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 7b8eeb7dbb..a7df801f77 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -172,8 +172,16 @@ InitPostmasterChild(void)
errmsg_internal("could not set postmaster death monitoring pipe to FD_CLOEXEC mode: %m")));
#endif
+ /*
+ * Init pgstat allocated bytes counters here for forked backends.
+ * Fork/exec backends have not yet reattached to shared memory at this
+ * point. They will init pgstat allocated bytes counters in
+ * PGSharedMemoryReAttach.
+ */
+#ifndef EXEC_BACKEND
/* Init allocated bytes to avoid double counting parent allocation */
pgstat_init_allocated_bytes();
+#endif
}
/*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8062589efd..bde8e28365 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3497,6 +3497,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ee49ca3937..697a619266 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -157,6 +157,9 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index f3f5945fdf..4a83a2f60f 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -440,6 +440,18 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ {
+ if (TopMemoryContext)
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -741,6 +753,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -938,6 +955,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1176,6 +1197,18 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /*
+ * Do not exceed maximum allowed memory allocation. NOTE: checking for
+ * the full size here rather than just the amount of increased
+ * allocation to prevent a potential underflow of *my_allocation
+ * allowance in cases where blksize - oldblksize does not trigger a
+ * refill but blksize is greater than *my_allocation_allowance.
+ * Underflow would occur with the call below to
+ * pgstat_report_allocated_bytes_increase()
+ */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 5708e8da7a..584b2ec8ef 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -201,6 +201,16 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ {
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -380,6 +390,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -483,6 +496,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index b436587bdd..9754c6d2f4 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -356,7 +356,16 @@ SlabContextCreate(MemoryContext parent,
elog(ERROR, "block size %zu for slab is too small for %zu-byte chunks",
blockSize, chunkSize);
-
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock)))
+ {
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
if (slab == NULL)
@@ -559,6 +568,10 @@ SlabAlloc(MemoryContext context, Size size)
}
else
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (unlikely(block == NULL))
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a6f52a4db4..97196b7eb1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5440,9 +5440,9 @@
descr => 'statistics: global memory allocation information',
proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f',
provolatile => 's', proparallel => 'r', prorettype => 'record',
- proargtypes => '', proallargtypes => '{oid,int8}',
- proargmodes => '{o,o}',
- proargnames => '{datid,global_dsm_allocated_bytes}',
+ proargtypes => '', proallargtypes => '{oid,int8,int8}',
+ proargmodes => '{o,o,o}',
+ proargnames => '{datid,max_total_bkend_mem_bytes_available,global_dsm_allocated_bytes}',
prosrc =>'pg_stat_get_global_memory_allocation' },
{ oid => '2022',
descr => 'statistics: information about currently active backends',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index c2c878219d..a2a5364a85 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -406,6 +406,13 @@ typedef struct PROC_HDR
int startupBufferPinWaitBufId;
/* Global dsm allocations */
pg_atomic_uint64 global_dsm_allocation;
+
+ /*
+ * Max backend memory allocation tracker. Used/Initialized when
+ * max_total_bkend_mem > 0 as max_total_bkend_mem (MB) converted to bytes.
+ * Decreases/increases with free/malloc of backend memory.
+ */
+ pg_atomic_uint64 max_total_bkend_mem_bytes;
} PROC_HDR;
extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 6434ece1ef..4eef3470a5 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "libpq/pqcomm.h"
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
+#include "storage/proc.h"
#include "utils/backend_progress.h"
@@ -304,6 +305,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -316,6 +318,10 @@ extern PGDLLIMPORT uint64 *my_aset_allocated_bytes;
extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes;
extern PGDLLIMPORT uint64 *my_generation_allocated_bytes;
extern PGDLLIMPORT uint64 *my_slab_allocated_bytes;
+extern PGDLLIMPORT uint64 allocation_allowance;
+extern PGDLLIMPORT uint64 initial_allocation_allowance;
+extern PGDLLIMPORT uint64 allocation_return;
+extern PGDLLIMPORT uint64 allocation_return_threshold;
/* ----------
@@ -363,6 +369,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
* pgstat_report_allocated_bytes_decrease() -
@@ -378,34 +385,44 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
{
uint64 temp;
- /* Avoid allocated_bytes unsigned integer overflow on decrease */
+ /* Sanity check: my allocated bytes should never drop below zero */
if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp))
{
- /* On overflow, set pgstat count of allocated bytes to zero */
+ /* On overflow, set allocated bytes and allocator type bytes to zero */
*my_allocated_bytes = 0;
-
- switch (pg_allocator_type)
+ *my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = 0;
+
+ /* Add freed memory to allocation return counter. */
+ allocation_return += proc_allocated_bytes;
+
+ /*
+ * Return freed memory to the global counter if return threshold is
+ * met.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
{
- case PG_ALLOC_ASET:
- *my_aset_allocated_bytes = 0;
- break;
- case PG_ALLOC_DSM:
- *my_dsm_allocated_bytes = 0;
- break;
- case PG_ALLOC_GENERATION:
- *my_generation_allocated_bytes = 0;
- break;
- case PG_ALLOC_SLAB:
- *my_slab_allocated_bytes = 0;
- break;
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global tracker */
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart the count */
+ allocation_return = 0;
+ }
}
}
else
{
- /* decrease allocation */
- *my_allocated_bytes -= proc_allocated_bytes;
+ /* Add freed memory to allocation return counter */
+ allocation_return += proc_allocated_bytes;
- /* Decrease allocator type allocated bytes. */
+ /* Decrease allocator type allocated bytes */
switch (pg_allocator_type)
{
case PG_ALLOC_ASET:
@@ -427,6 +444,30 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
*my_slab_allocated_bytes -= proc_allocated_bytes;
break;
}
+
+ /* decrease allocation */
+ *my_allocated_bytes = *my_aset_allocated_bytes +
+ *my_dsm_allocated_bytes + *my_generation_allocated_bytes +
+ *my_slab_allocated_bytes;
+
+ /*
+ * Return freed memory to the global counter if return threshold is
+ * met.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
+ {
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global tracker */
+ pg_atomic_add_fetch_u64(&procglobal->max_total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart the count */
+ allocation_return = 0;
+ }
+ }
}
return;
@@ -444,7 +485,13 @@ static inline void
pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
int pg_allocator_type)
{
- *my_allocated_bytes += proc_allocated_bytes;
+ uint64 temp;
+
+ /* Sanity check: my allocated bytes should never drop below zero */
+ if (pg_sub_u64_overflow(allocation_allowance, proc_allocated_bytes, &temp))
+ allocation_allowance = 0;
+ else
+ allocation_allowance -= proc_allocated_bytes;
/* Increase allocator type allocated bytes */
switch (pg_allocator_type)
@@ -469,6 +516,9 @@ pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
break;
}
+ *my_allocated_bytes = *my_aset_allocated_bytes + *my_dsm_allocated_bytes +
+ *my_generation_allocated_bytes + *my_slab_allocated_bytes;
+
return;
}
@@ -488,6 +538,36 @@ pgstat_init_allocated_bytes(void)
*my_generation_allocated_bytes = 0;
*my_slab_allocated_bytes = 0;
+ /* If we're limiting backend memory */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+
+ allocation_return = 0;
+ allocation_allowance = 0;
+
+ /* Account for the initial allocation allowance */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->max_total_bkend_mem_bytes)) >= initial_allocation_allowance)
+ {
+ /*
+ * On success populate allocation_allowance. Failure here will
+ * result in the backend's first invocation of
+ * exceeds_max_total_bkend_mem allocating requested, default, or
+ * available memory or result in an out of memory error.
+ */
+ if (pg_atomic_compare_exchange_u64(&procglobal->max_total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem -
+ initial_allocation_allowance))
+ {
+ allocation_allowance = initial_allocation_allowance;
+
+ break;
+ }
+ }
+ }
+
return;
}
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 77c4a18e26..403715a3d5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1881,13 +1881,15 @@ pg_stat_global_memory_allocation| WITH sums AS (
SELECT s.datid,
current_setting('shared_memory_size'::text, true) AS shared_memory_size,
(current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ pg_size_bytes(current_setting('max_total_backend_memory'::text, true)) AS max_total_backend_memory_bytes,
+ s.max_total_bkend_mem_bytes_available,
s.global_dsm_allocated_bytes,
sums.total_aset_allocated_bytes,
sums.total_dsm_allocated_bytes,
sums.total_generation_allocated_bytes,
sums.total_slab_allocated_bytes
FROM sums,
- (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes)
+ (pg_stat_get_global_memory_allocation() s(datid, max_total_bkend_mem_bytes_available, global_dsm_allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
--
2.25.1
0001-Add-tracking-of-backend-memory-allocated.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated.patchDownload
From 7da189612aab9cf957efa179c52f4b4578ee2944 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated
Add tracking of backend memory allocated in total and by allocation
type (aset, dsm, generation, slab) by process.
allocated_bytes tracks the current bytes of memory allocated to the
backend process. aset_allocated_bytes, dsm_allocated_bytes,
generation_allocated_bytes and slab_allocated_bytes track the
allocation by type for the backend process. They are updated for the
process as memory is malloc'd/freed. Memory allocated to items on
the freelist is included. Dynamic shared memory allocations are
included only in the value displayed for the backend that created
them, they are not included in the value for backends that are
attached to them to avoid double counting. DSM allocations that are
not destroyed by the creating process prior to it's exit are
considered long lived and are tracked in a global counter
global_dsm_allocated_bytes. We limit the floor of allocation
counters to zero. Created views pg_stat_global_memory_allocation and
pg_stat_memory_allocation for access to these trackers.
---
doc/src/sgml/monitoring.sgml | 246 ++++++++++++++++++++
src/backend/catalog/system_views.sql | 34 +++
src/backend/storage/ipc/dsm.c | 11 +-
src/backend/storage/ipc/dsm_impl.c | 78 +++++++
src/backend/storage/lmgr/proc.c | 1 +
src/backend/utils/activity/backend_status.c | 114 +++++++++
src/backend/utils/adt/pgstatfuncs.c | 84 +++++++
src/backend/utils/init/miscinit.c | 3 +
src/backend/utils/mmgr/aset.c | 17 ++
src/backend/utils/mmgr/generation.c | 15 ++
src/backend/utils/mmgr/slab.c | 22 ++
src/include/catalog/pg_proc.dat | 17 ++
src/include/storage/proc.h | 2 +
src/include/utils/backend_status.h | 156 ++++++++++++-
src/test/regress/expected/rules.out | 27 +++
src/test/regress/expected/stats.out | 36 +++
src/test/regress/sql/stats.sql | 20 ++
17 files changed, 881 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bce9ae4661..70b3441412 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5644,6 +5644,252 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</sect2>
+ <sect2 id="monitoring-pg-stat-memory-allocation-view">
+ <title><structname>pg_stat_memory_allocation</structname></title>
+
+ <indexterm>
+ <primary>pg_stat_memory_allocation</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_stat_memory_allocation</structname> view will have one
+ row per server process, showing information related to the current memory
+ allocation of that process in total and by allocator type. Due to the
+ dynamic nature of memory allocations the allocated bytes values may not be
+ exact but should be sufficient for the intended purposes. Dynamic shared
+ memory allocations are included only in the value displayed for the backend
+ that created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use
+ <function>pg_size_pretty</function> described in
+ <xref linkend="functions-admin-dbsize"/> to make these values more easily
+ readable.
+ </para>
+
+ <table id="pg-stat-memory-allocation-view" xreflabel="pg_stat_memory_allocation">
+ <title><structname>pg_stat_memory_allocation</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of this backend
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>aset_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the allocation
+ set allocator.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the dynamic
+ shared memory allocator. Upon process exit, dsm allocations that have
+ not been freed are considered long lived and added to
+ <structfield>global_dsm_allocated_bytes</structfield> found in the
+ <link linkend="monitoring-pg-stat-global-memory-allocation-view">
+ <structname>pg_stat_global_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>generation_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the generation
+ allocator.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slab_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the slab
+ allocator.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
+ <sect2 id="monitoring-pg-stat-global-memory-allocation-view">
+ <title><structname>pg_stat_global_memory_allocation</structname></title>
+
+ <indexterm>
+ <primary>pg_stat_global-memory_allocation</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_stat_global_memory_allocation</structname> view will
+ have one row showing information related to current shared memory
+ allocations. Due to the dynamic nature of memory allocations the allocated
+ bytes values may not be exact but should be sufficient for the intended
+ purposes. Use <function>pg_size_pretty</function> described in
+ <xref linkend="functions-admin-dbsize"/> to make the byte populated values
+ more easily readable.
+ </para>
+
+ <table id="pg-stat-global-memory-allocation-view" xreflabel="pg_stat_global_memory_allocation">
+ <title><structname>pg_stat_global_memory_allocation</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>shared_memory_size_mb</structfield> <type>integer</type>
+ </para>
+ <para>
+ Reports the size of the main shared memory area, rounded up to the
+ nearest megabyte. See <xref linkend="guc-shared-memory-size"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>shared_memory_size_in_huge_pages</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Reports the number of huge pages that are needed for the main shared
+ memory area based on the specified huge_page_size. If huge pages are not
+ supported, this will be -1. See
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Long lived dynamically allocated memory currently allocated to the
+ database. Upon process exit, dsm allocations that have not been freed
+ are considered long lived and added to
+ <structfield>global_dsm_allocated_bytes</structfield>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_aset_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>aset_allocated_bytes</structfield> for all
+ backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>dsm_allocated_bytes</structfield> for all
+ backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_generation_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>generation_allocated_bytes</structfield> for
+ all backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_slab_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>slab_allocated_bytes</structfield> for all
+ backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
<sect2 id="monitoring-stats-functions">
<title>Statistics Functions</title>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6b098234f8..6876564904 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1332,3 +1332,37 @@ CREATE VIEW pg_stat_subscription_stats AS
ss.stats_reset
FROM pg_subscription as s,
pg_stat_get_subscription_stats(s.oid) as ss;
+
+CREATE VIEW pg_stat_memory_allocation AS
+ SELECT
+ S.datid AS datid,
+ S.pid,
+ S.allocated_bytes,
+ S.aset_allocated_bytes,
+ S.dsm_allocated_bytes,
+ S.generation_allocated_bytes,
+ S.slab_allocated_bytes
+ FROM pg_stat_get_memory_allocation(NULL) AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
+CREATE VIEW pg_stat_global_memory_allocation AS
+WITH sums AS (
+ SELECT
+ SUM(aset_allocated_bytes) AS total_aset_allocated_bytes,
+ SUM(dsm_allocated_bytes) AS total_dsm_allocated_bytes,
+ SUM(generation_allocated_bytes) AS total_generation_allocated_bytes,
+ SUM(slab_allocated_bytes) AS total_slab_allocated_bytes
+ FROM
+ pg_stat_memory_allocation
+)
+SELECT
+ S.datid AS datid,
+ current_setting('shared_memory_size', true) as shared_memory_size,
+ (current_setting('shared_memory_size_in_huge_pages', true))::integer as shared_memory_size_in_huge_pages,
+ S.global_dsm_allocated_bytes,
+ sums.total_aset_allocated_bytes,
+ sums.total_dsm_allocated_bytes,
+ sums.total_generation_allocated_bytes,
+ sums.total_slab_allocated_bytes
+ FROM sums, pg_stat_get_global_memory_allocation() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 10b029bb16..64b1fecd1c 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -775,6 +775,15 @@ dsm_detach_all(void)
void
dsm_detach(dsm_segment *seg)
{
+ /*
+ * Retain mapped_size to pass into destroy call in cases where the detach
+ * is the last reference. mapped_size is zeroed as part of the detach
+ * process, but is needed later in these cases for dsm_allocated_bytes
+ * accounting.
+ */
+ Size local_seg_mapped_size = seg->mapped_size;
+ Size *ptr_local_seg_mapped_size = &local_seg_mapped_size;
+
/*
* Invoke registered callbacks. Just in case one of those callbacks
* throws a further error that brings us back here, pop the callback
@@ -855,7 +864,7 @@ dsm_detach(dsm_segment *seg)
*/
if (is_main_region_dsm_handle(seg->handle) ||
dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private,
- &seg->mapped_address, &seg->mapped_size, WARNING))
+ &seg->mapped_address, ptr_local_seg_mapped_size, WARNING))
{
LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
if (is_main_region_dsm_handle(seg->handle))
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index f0965c3481..16e2bded59 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,33 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ pgstat_report_allocated_bytes_increase(request_size - *mapped_size, PG_ALLOC_DSM);
+#else
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -537,6 +573,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +628,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
@@ -652,6 +703,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -768,6 +826,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(info.RegionSize, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -812,6 +876,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -933,6 +1004,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..d86fbdfd9b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -180,6 +180,7 @@ InitProcGlobal(void)
ProcGlobal->checkpointerLatch = NULL;
pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO);
pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
+ pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0);
/*
* Create and initialize all the PGPROC structures we'll need. There are
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 608d01ea0d..f921c4bbde 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,24 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/*
+ * Memory allocated to this backend prior to pgstats initialization. Migrated to
+ * shared memory on pgstats initialization.
+ */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
+
+/* Memory allocated to this backend by type prior to pgstats initialization.
+ * Migrated to shared memory on pgstats initialization
+ */
+uint64 local_my_aset_allocated_bytes = 0;
+uint64 *my_aset_allocated_bytes = &local_my_aset_allocated_bytes;
+uint64 local_my_dsm_allocated_bytes = 0;
+uint64 *my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
+uint64 local_my_generation_allocated_bytes = 0;
+uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
+uint64 local_my_slab_allocated_bytes = 0;
+uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -400,6 +418,32 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local storage to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes,
+ &MyBEEntry->aset_allocated_bytes,
+ &MyBEEntry->dsm_allocated_bytes,
+ &MyBEEntry->generation_allocated_bytes,
+ &MyBEEntry->slab_allocated_bytes);
+
+ /*
+ * Populate sum of memory allocated prior to pgstats initialization to
+ * pgstats and zero the local variable. This is a += assignment because
+ * InitPostgres allocates memory after pgstat_beinit but prior to
+ * pgstat_bestart so we have allocations to both local and shared memory
+ * to combine.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+ lbeentry.aset_allocated_bytes += local_my_aset_allocated_bytes;
+ local_my_aset_allocated_bytes = 0;
+
+ lbeentry.dsm_allocated_bytes += local_my_dsm_allocated_bytes;
+ local_my_dsm_allocated_bytes = 0;
+ lbeentry.generation_allocated_bytes += local_my_generation_allocated_bytes;
+ local_my_generation_allocated_bytes = 0;
+ lbeentry.slab_allocated_bytes += local_my_slab_allocated_bytes;
+ local_my_slab_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -459,6 +503,9 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /* Stop reporting memory allocation changes to shared memory */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1194,3 +1241,70 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * shared memory.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * allocated bytes accounting into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
+ uint64 *aset_allocated_bytes,
+ uint64 *dsm_allocated_bytes,
+ uint64 *generation_allocated_bytes,
+ uint64 *slab_allocated_bytes)
+{
+ /* Map allocations to shared memory */
+ my_allocated_bytes = allocated_bytes;
+ *allocated_bytes = local_my_allocated_bytes;
+
+ my_aset_allocated_bytes = aset_allocated_bytes;
+ *aset_allocated_bytes = local_my_aset_allocated_bytes;
+
+ my_dsm_allocated_bytes = dsm_allocated_bytes;
+ *dsm_allocated_bytes = local_my_dsm_allocated_bytes;
+
+ my_generation_allocated_bytes = generation_allocated_bytes;
+ *generation_allocated_bytes = local_my_generation_allocated_bytes;
+
+ my_slab_allocated_bytes = slab_allocated_bytes;
+ *slab_allocated_bytes = local_my_slab_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the locations set up
+ * by pgstat_set_allocated_bytes_storage become invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /*
+ * Add dsm allocations that have not been freed to global dsm
+ * accounting
+ */
+ pg_atomic_add_fetch_u64(&procglobal->global_dsm_allocation,
+ *my_dsm_allocated_bytes);
+ }
+
+ /* Reset memory allocation variables */
+ *my_allocated_bytes = local_my_allocated_bytes = 0;
+ *my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = local_my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = local_my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = local_my_slab_allocated_bytes = 0;
+
+ /* Point my_{*_}allocated_bytes from shared memory back to local */
+ my_allocated_bytes = &local_my_allocated_bytes;
+ my_aset_allocated_bytes = &local_my_aset_allocated_bytes;
+ my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
+ my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
+ my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index eec9f3cf9b..a5fd5e6964 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1995,3 +1995,87 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
}
+
+/*
+ * Get the memory allocation of PG backends.
+ */
+Datum
+pg_stat_get_memory_allocation(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_MEMORY_ALLOCATION_COLS 7
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ /* for each row */
+ Datum values[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0};
+ bool nulls[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0};
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_fetch_stat_local_beentry(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ /* If looking for specific PID, ignore all the others */
+ if (pid != -1 && beentry->st_procpid != pid)
+ continue;
+
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = Int32GetDatum(beentry->st_procpid);
+ values[2] = UInt64GetDatum(beentry->allocated_bytes);
+ values[3] = UInt64GetDatum(beentry->aset_allocated_bytes);
+ values[4] = UInt64GetDatum(beentry->dsm_allocated_bytes);
+ values[5] = UInt64GetDatum(beentry->generation_allocated_bytes);
+ values[6] = UInt64GetDatum(beentry->slab_allocated_bytes);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ /* If only a single backend was requested, and we found it, break. */
+ if (pid != -1)
+ break;
+ }
+
+ return (Datum) 0;
+}
+
+/*
+ * Get the global memory allocation statistics.
+ */
+Datum
+pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2
+ TupleDesc tupdesc;
+ Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
+ bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Initialise attributes information in the tuple descriptor */
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid",
+ OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes",
+ INT8OID, -1, 0);
+ BlessTupleDesc(tupdesc);
+
+ /* datid */
+ values[0] = ObjectIdGetDatum(MyDatabaseId);
+
+ /* get global_dsm_allocated_bytes */
+ values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
+
+ /* Returns the record as Datum */
+ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a604432126..7b8eeb7dbb 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -171,6 +171,9 @@ InitPostmasterChild(void)
(errcode_for_socket_access(),
errmsg_internal("could not set postmaster death monitoring pipe to FD_CLOEXEC mode: %m")));
#endif
+
+ /* Init allocated bytes to avoid double counting parent allocation */
+ pgstat_init_allocated_bytes();
}
/*
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 2589941ec4..f3f5945fdf 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -521,6 +522,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_ASET);
return (MemoryContext) set;
}
@@ -543,6 +545,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -585,6 +588,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -595,6 +599,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -613,6 +618,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -651,11 +657,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
}
/* Now add the just-deleted context to the freelist. */
@@ -672,7 +680,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -685,6 +696,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes_decrease(deallocation + context->mem_allocated, PG_ALLOC_ASET);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -734,6 +746,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +957,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1041,6 +1055,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes_decrease(block->endptr - ((char *) block), PG_ALLOC_ASET);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1171,7 +1186,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes_decrease(oldblksize, PG_ALLOC_ASET);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index ebcb61e9b6..5708e8da7a 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -267,6 +268,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_GENERATION);
return (MemoryContext) set;
}
@@ -283,6 +285,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -305,9 +308,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_GENERATION);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -328,6 +336,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes_decrease(context->mem_allocated, PG_ALLOC_GENERATION);
+
/* And free the context header and keeper block */
free(context);
}
@@ -374,6 +385,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* block with a single (used) chunk */
block->context = set;
@@ -477,6 +489,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -729,6 +742,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes_decrease(block->blksize, PG_ALLOC_GENERATION);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 33dca0f37c..b436587bdd 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -69,6 +69,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -413,6 +414,13 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add context header size to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes_increase(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock),
+ PG_ALLOC_SLAB);
+
return (MemoryContext) slab;
}
@@ -429,6 +437,7 @@ SlabReset(MemoryContext context)
SlabContext *slab = (SlabContext *) context;
dlist_mutable_iter miter;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -449,6 +458,7 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
/* walk over blocklist and free the blocks */
@@ -465,9 +475,11 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_SLAB);
slab->curBlocklistIndex = 0;
Assert(context->mem_allocated == 0);
@@ -482,6 +494,14 @@ SlabDelete(MemoryContext context)
{
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ /*
+ * Until context header allocation is included in context->mem_allocated,
+ * cast to slab and decrement the header allocation
+ */
+ pgstat_report_allocated_bytes_decrease(Slab_CONTEXT_HDRSZ(((SlabContext *) context)->chunksPerBlock),
+ PG_ALLOC_SLAB);
+
/* And free the context header */
free(context);
}
@@ -546,6 +566,7 @@ SlabAlloc(MemoryContext context, Size size)
block->slab = slab;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes_increase(slab->blockSize, PG_ALLOC_SLAB);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -732,6 +753,7 @@ SlabFree(void *pointer)
#endif
free(block);
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes_decrease(slab->blockSize, PG_ALLOC_SLAB);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9f2642201..a6f52a4db4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5427,6 +5427,23 @@
proname => 'pg_stat_get_backend_idset', prorows => '100', proretset => 't',
provolatile => 's', proparallel => 'r', prorettype => 'int4',
proargtypes => '', prosrc => 'pg_stat_get_backend_idset' },
+{ oid => '9890',
+ descr => 'statistics: memory allocation information for backends',
+ proname => 'pg_stat_get_memory_allocation', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,oid,int4,int8,int8,int8,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,allocated_bytes,aset_allocated_bytes,dsm_allocated_bytes,generation_allocated_bytes,slab_allocated_bytes}',
+ prosrc => 'pg_stat_get_memory_allocation' },
+{ oid => '9891',
+ descr => 'statistics: global memory allocation information',
+ proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f',
+ provolatile => 's', proparallel => 'r', prorettype => 'record',
+ proargtypes => '', proallargtypes => '{oid,int8}',
+ proargmodes => '{o,o}',
+ proargnames => '{datid,global_dsm_allocated_bytes}',
+ prosrc =>'pg_stat_get_global_memory_allocation' },
{ oid => '2022',
descr => 'statistics: information about currently active backends',
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 4258cd92c9..c2c878219d 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -404,6 +404,8 @@ typedef struct PROC_HDR
int spins_per_delay;
/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
int startupBufferPinWaitBufId;
+ /* Global dsm allocations */
+ pg_atomic_uint64 global_dsm_allocation;
} PROC_HDR;
extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index f7bd83113a..6434ece1ef 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -10,6 +10,7 @@
#ifndef BACKEND_STATUS_H
#define BACKEND_STATUS_H
+#include "common/int.h"
#include "datatype/timestamp.h"
#include "libpq/pqcomm.h"
#include "miscadmin.h" /* for BackendType */
@@ -32,6 +33,14 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocator type */
+enum pg_allocator_type
+{
+ PG_ALLOC_ASET = 1,
+ PG_ALLOC_DSM,
+ PG_ALLOC_GENERATION,
+ PG_ALLOC_SLAB
+};
/* ----------
* Shared-memory data structures
@@ -169,6 +178,15 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
+
+ /* Current memory allocated to this backend by type */
+ uint64 aset_allocated_bytes;
+ uint64 dsm_allocated_bytes;
+ uint64 generation_allocated_bytes;
+ uint64 slab_allocated_bytes;
} PgBackendStatus;
@@ -293,6 +311,11 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_aset_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_generation_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_slab_allocated_bytes;
/* ----------
@@ -324,7 +347,12 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
+ uint64 *aset_allocated_bytes,
+ uint64 *dsm_allocated_bytes,
+ uint64 *generation_allocated_bytes,
+ uint64 *slab_allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -336,5 +364,131 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes_decrease() -
+ * Called to report decrease in memory allocated for this backend.
+ *
+ * my_{*_}allocated_bytes initially points to local memory, making it safe to
+ * call this before pgstats has been initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
+ int pg_allocator_type)
+{
+ uint64 temp;
+
+ /* Avoid allocated_bytes unsigned integer overflow on decrease */
+ if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp))
+ {
+ /* On overflow, set pgstat count of allocated bytes to zero */
+ *my_allocated_bytes = 0;
+
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_DSM:
+ *my_dsm_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes = 0;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes = 0;
+ break;
+ }
+ }
+ else
+ {
+ /* decrease allocation */
+ *my_allocated_bytes -= proc_allocated_bytes;
+
+ /* Decrease allocator type allocated bytes. */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_DSM:
+
+ /*
+ * Some dsm allocations live beyond process exit. These are
+ * accounted for in a global counter in
+ * pgstat_reset_allocated_bytes_storage at process exit.
+ */
+ *my_dsm_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes -= proc_allocated_bytes;
+ break;
+ }
+ }
+
+ return;
+}
+
+/* ----------
+ * pgstat_report_allocated_bytes_increase() -
+ * Called to report increase in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
+ int pg_allocator_type)
+{
+ *my_allocated_bytes += proc_allocated_bytes;
+
+ /* Increase allocator type allocated bytes */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_DSM:
+
+ /*
+ * Some dsm allocations live beyond process exit. These are
+ * accounted for in a global counter in
+ * pgstat_reset_allocated_bytes_storage at process exit.
+ */
+ *my_dsm_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes += proc_allocated_bytes;
+ break;
+ }
+
+ return;
+}
+
+/* ---------
+ * pgstat_init_allocated_bytes() -
+ *
+ * Called to initialize allocated bytes variables after fork and to
+ * avoid double counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_init_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+ *my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ab1aebfde4..77c4a18e26 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1871,6 +1871,24 @@ pg_stat_database_conflicts| SELECT oid AS datid,
pg_stat_get_db_conflict_bufferpin(oid) AS confl_bufferpin,
pg_stat_get_db_conflict_startup_deadlock(oid) AS confl_deadlock
FROM pg_database d;
+pg_stat_global_memory_allocation| WITH sums AS (
+ SELECT sum(pg_stat_memory_allocation.aset_allocated_bytes) AS total_aset_allocated_bytes,
+ sum(pg_stat_memory_allocation.dsm_allocated_bytes) AS total_dsm_allocated_bytes,
+ sum(pg_stat_memory_allocation.generation_allocated_bytes) AS total_generation_allocated_bytes,
+ sum(pg_stat_memory_allocation.slab_allocated_bytes) AS total_slab_allocated_bytes
+ FROM pg_stat_memory_allocation
+ )
+ SELECT s.datid,
+ current_setting('shared_memory_size'::text, true) AS shared_memory_size,
+ (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ s.global_dsm_allocated_bytes,
+ sums.total_aset_allocated_bytes,
+ sums.total_dsm_allocated_bytes,
+ sums.total_generation_allocated_bytes,
+ sums.total_slab_allocated_bytes
+ FROM sums,
+ (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
gss_princ AS principal,
@@ -1890,6 +1908,15 @@ pg_stat_io| SELECT backend_type,
fsyncs,
stats_reset
FROM pg_stat_get_io() b(backend_type, io_object, io_context, reads, writes, extends, op_bytes, hits, evictions, reuses, fsyncs, stats_reset);
+pg_stat_memory_allocation| SELECT s.datid,
+ s.pid,
+ s.allocated_bytes,
+ s.aset_allocated_bytes,
+ s.dsm_allocated_bytes,
+ s.generation_allocated_bytes,
+ s.slab_allocated_bytes
+ FROM (pg_stat_get_memory_allocation(NULL::integer) s(datid, pid, allocated_bytes, aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes, slab_allocated_bytes)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 5f1821938d..f507a6710c 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1511,4 +1511,40 @@ SELECT COUNT(*) FROM brin_hot_3 WHERE a = 2;
DROP TABLE brin_hot_3;
SET enable_seqscan = on;
+-- ensure that allocated_bytes exist for backends
+SELECT
+ allocated_bytes > 0 AS result
+FROM
+ pg_stat_activity ps
+ JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid)
+WHERE
+ backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
+-- ensure that pg_stat_global_memory_allocation view exists
+SELECT
+ datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0
+FROM
+ pg_stat_global_memory_allocation;
+ ?column? | ?column? | ?column? | ?column?
+----------+----------+----------+----------
+ t | t | t | t
+(1 row)
+
+-- ensure that pg_stat_memory_allocation view exists
+SELECT
+ pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0
+FROM
+ pg_stat_memory_allocation limit 1;
+ ?column? | ?column? | ?column? | ?column? | ?column? | ?column?
+----------+----------+----------+----------+----------+----------
+ t | t | t | t | t | t
+(1 row)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 58db803ed6..a195776c9d 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -783,4 +783,24 @@ DROP TABLE brin_hot_3;
SET enable_seqscan = on;
+-- ensure that allocated_bytes exist for backends
+SELECT
+ allocated_bytes > 0 AS result
+FROM
+ pg_stat_activity ps
+ JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid)
+WHERE
+ backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+
+-- ensure that pg_stat_global_memory_allocation view exists
+SELECT
+ datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0
+FROM
+ pg_stat_global_memory_allocation;
+
+-- ensure that pg_stat_memory_allocation view exists
+SELECT
+ pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0
+FROM
+ pg_stat_memory_allocation limit 1;
-- End of Stats Test
--
2.25.1
Thank you! I just tried our benchmark and got a performance degration of around 28 %, which is way better than the last patch.
The simple query select * from generate_series(0, 10000000) shows roughly 18.9 % degradation on my test server.
By raising initial_allocation_allowance and allocation_allowance_refill_qty I can get it to 16 % degradation. So most of the degradation seems to be independent from raising the allowance.
I think we probably should investigate this further.
Regards
Arne
On Wed, 2023-04-19 at 23:28 +0000, Arne Roland wrote:
Thank you! I just tried our benchmark and got a performance
degration > of around 28 %, which is way better than the last
patch.The simple query select * from generate_series(0, 10000000) shows >
roughly 18.9 % degradation on my test server.By raising initial_allocation_allowance and >
allocation_allowance_refill_qty I can get it to 16 % degradation.
So > most of the degradation seems to be independent from raising
the > allowance.I think we probably should investigate this further.
Regards
Arne
Hi Arne,
Thanks for the feedback.
I'm plannning to look at this.
Is your benchmark something that I could utilize? I.E. is it a set of
scripts or a standard test from somewhere that I can duplicate?
Thanks,
Reid
On Wed, 2023-05-17 at 23:07 -0400, reid.thompson@crunchydata.com wrote:
Thanks for the feedback.
I'm plannning to look at this.
Is your benchmark something that I could utilize? I.E. is it a set of
scripts or a standard test from somewhere that I can duplicate?Thanks,
Reid
Hi Arne,
Followup to the above.
I experimented on my system regarding
"The simple query select * from generate_series(0, 10000000) shows roughly 18.9 % degradation on my test server."
My laptop:
32GB ram
11th Gen Intel(R) Core(TM) i7-11850H 8 cores/16 threads @ 2.50GHz (Max Turbo Frequency. 4.80 GHz ; Cache. 24 MB)
SSD -> Model: KXG60ZNV1T02 NVMe KIOXIA 1024GB (nvme)
I updated to latest master and rebased my patch branches.
I wrote a script to check out, build, install, init, and startup
master, patch 1, patch 1+2, patch 1+2 as master, pg-stats-memory,
dev-max-memory, and dev-max-memory-unset configured with
../../configure --silent --prefix=/home/rthompso/src/git/postgres/install/${dir} --with-openssl --with-tcl --with-tclconfig=/usr/lib/tcl8.6 --with-perl --with-libxml --with-libxslt --with-python --with-gssapi --with-systemd --with-ldap --enable-nls
where $dir in master, pg-stats-memory, and dev-max-memory,
dev-max-memory-unset.
The only change made to the default postgresql.conf was to have the
script add to the dev-max-memory instance the line
"max_total_backend_memory = 2048" before startup.
I did find one change in patch 2 that I pushed back into patch 1, this
should only impact the pg-stats-memory instance.
my .psqlrc turns timing on
I created a script where I can pass two instances to be compared.
It invokes
psql -At -d postgres $connstr -P pager=off -c 'select * from generate_series(0, 10000000)'
100 times on each of the 2 instances and calculates the AVG time and SD
for the 100 runs. It then uses the AVG from each instance to calculate
the percentage difference.
Depending on the instance, my results differ from master from
negligible to ~5.5%. Comparing master to itself had up to a ~2%
variation. See below.
------------------------
12 runs comparing dev-max-memory 2048 VS master
Shows ~3% to 5.5% variation
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1307.14 -> VER dev-max-memory 2048
1240.74 -> VER master
5.21218% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1315.99 -> VER dev-max-memory 2048
1245.64 -> VER master
5.4926% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1317.39 -> VER dev-max-memory 2048
1265.33 -> VER master
4.03141% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1313.52 -> VER dev-max-memory 2048
1256.69 -> VER master
4.42221% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1329.98 -> VER dev-max-memory 2048
1253.75 -> VER master
5.90077% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1314.47 -> VER dev-max-memory 2048
1245.6 -> VER master
5.38032% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1309.7 -> VER dev-max-memory 2048
1258.55 -> VER master
3.98326% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1322.16 -> VER dev-max-memory 2048
1248.94 -> VER master
5.69562% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1320.15 -> VER dev-max-memory 2048
1261.41 -> VER master
4.55074% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1345.22 -> VER dev-max-memory 2048
1280.96 -> VER master
4.8938% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1296.03 -> VER dev-max-memory 2048
1257.06 -> VER master
3.05277% difference
--
Calculate average runtime percentage difference between VER dev-max-memory 2048 and VER master
1319.5 -> VER dev-max-memory 2048
1252.34 -> VER master
5.22272% difference
----------------------------
12 showing dev-max-memory-unset VS master
Shows ~2.5% to 5% variation
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1300.93 -> VER dev-max-memory unset
1235.12 -> VER master
5.18996% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1293.57 -> VER dev-max-memory unset
1263.93 -> VER master
2.31789% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1303.05 -> VER dev-max-memory unset
1258.11 -> VER master
3.50935% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1302.14 -> VER dev-max-memory unset
1256.51 -> VER master
3.56672% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1299.22 -> VER dev-max-memory unset
1282.74 -> VER master
1.27655% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1334.06 -> VER dev-max-memory unset
1263.77 -> VER master
5.41144% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1319.92 -> VER dev-max-memory unset
1262.35 -> VER master
4.45887% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1318.01 -> VER dev-max-memory unset
1257.16 -> VER master
4.7259% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1316.88 -> VER dev-max-memory unset
1257.63 -> VER master
4.60282% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1320.33 -> VER dev-max-memory unset
1282.12 -> VER master
2.93646% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1306.91 -> VER dev-max-memory unset
1246.12 -> VER master
4.76218% difference
--
Calculate average runtime percentage difference between VER dev-max-memory unset and VER master
1320.65 -> VER dev-max-memory unset
1258.78 -> VER master
4.79718% difference
-------------------------------
12 showing pg-stat-activity-only VS master
Shows ~<1% to 2.5% variation
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1252.65 -> VER pg-stat-activity-backend-memory-allocated
1245.36 -> VER master
0.583665% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1294.75 -> VER pg-stat-activity-backend-memory-allocated
1277.55 -> VER master
1.33732% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1264.11 -> VER pg-stat-activity-backend-memory-allocated
1257.57 -> VER master
0.518702% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1267.44 -> VER pg-stat-activity-backend-memory-allocated
1251.31 -> VER master
1.28079% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1270.05 -> VER pg-stat-activity-backend-memory-allocated
1250.1 -> VER master
1.58324% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1298.92 -> VER pg-stat-activity-backend-memory-allocated
1265.04 -> VER master
2.64279% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1280.99 -> VER pg-stat-activity-backend-memory-allocated
1263.51 -> VER master
1.37394% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1273.23 -> VER pg-stat-activity-backend-memory-allocated
1275.53 -> VER master
-0.18048% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1261.2 -> VER pg-stat-activity-backend-memory-allocated
1263.04 -> VER master
-0.145786% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1289.73 -> VER pg-stat-activity-backend-memory-allocated
1289.02 -> VER master
0.0550654% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1287.57 -> VER pg-stat-activity-backend-memory-allocated
1279.42 -> VER master
0.634985% difference
--
Calculate average runtime percentage difference between VER pg-stat-activity-backend-memory-allocated and VER master
1272.01 -> VER pg-stat-activity-backend-memory-allocated
1259.22 -> VER master
1.01058% difference
----------------------------------
I also did 12 runs master VS master
Shows, ~1% to 2% variation
Calculate average runtime percentage difference between VER master and VER master
1239.6 -> VER master
1263.73 -> VER master
-1.92783% difference
--
Calculate average runtime percentage difference between VER master and VER master
1253.82 -> VER master
1252.5 -> VER master
0.105334% difference
--
Calculate average runtime percentage difference between VER master and VER master
1256.05 -> VER master
1258.97 -> VER master
-0.232205% difference
--
Calculate average runtime percentage difference between VER master and VER master
1264.8 -> VER master
1248.94 -> VER master
1.26186% difference
--
Calculate average runtime percentage difference between VER master and VER master
1265.08 -> VER master
1275.43 -> VER master
-0.814797% difference
--
Calculate average runtime percentage difference between VER master and VER master
1260.95 -> VER master
1288.81 -> VER master
-2.1853% difference
--
Calculate average runtime percentage difference between VER master and VER master
1260.46 -> VER master
1252.86 -> VER master
0.604778% difference
--
Calculate average runtime percentage difference between VER master and VER master
1253.49 -> VER master
1255.25 -> VER master
-0.140309% difference
--
Calculate average runtime percentage difference between VER master and VER master
1277.5 -> VER master
1267.42 -> VER master
0.792166% difference
--
Calculate average runtime percentage difference between VER master and VER master
1266.2 -> VER master
1283.12 -> VER master
-1.32741% difference
--
Calculate average runtime percentage difference between VER master and VER master
1245.78 -> VER master
1246.78 -> VER master
-0.0802388% difference
--
Calculate average runtime percentage difference between VER master and VER master
1255.15 -> VER master
1276.73 -> VER master
-1.70466% difference
On Mon, 2023-05-22 at 08:42 -0400, reid.thompson@crunchydata.com wrote:
On Wed, 2023-05-17 at 23:07 -0400, reid.thompson@crunchydata.com wrote:
Thanks for the feedback.
I'm plannning to look at this.
Is your benchmark something that I could utilize? I.E. is it a set of
scripts or a standard test from somewhere that I can duplicate?Thanks,
Reid
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent to patch 1.
Attachments:
0001-Add-tracking-of-backend-memory-allocated.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated.patchDownload
From e6f8499e0270f2291494260bc341e8ad1411c2ae Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated
Add tracking of backend memory allocated in total and by allocation
type (aset, dsm, generation, slab) by process.
allocated_bytes tracks the current bytes of memory allocated to the
backend process. aset_allocated_bytes, dsm_allocated_bytes,
generation_allocated_bytes and slab_allocated_bytes track the
allocation by type for the backend process. They are updated for the
process as memory is malloc'd/freed. Memory allocated to items on
the freelist is included. Dynamic shared memory allocations are
included only in the value displayed for the backend that created
them, they are not included in the value for backends that are
attached to them to avoid double counting. DSM allocations that are
not destroyed by the creating process prior to it's exit are
considered long lived and are tracked in a global counter
global_dsm_allocated_bytes. We limit the floor of allocation
counters to zero. Created views pg_stat_global_memory_allocation and
pg_stat_memory_allocation for access to these trackers.
---
doc/src/sgml/monitoring.sgml | 246 ++++++++++++++++++++
src/backend/catalog/system_views.sql | 34 +++
src/backend/storage/ipc/dsm.c | 11 +-
src/backend/storage/ipc/dsm_impl.c | 78 +++++++
src/backend/storage/lmgr/proc.c | 1 +
src/backend/utils/activity/backend_status.c | 114 +++++++++
src/backend/utils/adt/pgstatfuncs.c | 84 +++++++
src/backend/utils/init/miscinit.c | 3 +
src/backend/utils/mmgr/aset.c | 17 ++
src/backend/utils/mmgr/generation.c | 15 ++
src/backend/utils/mmgr/slab.c | 22 ++
src/include/catalog/pg_proc.dat | 17 ++
src/include/storage/proc.h | 2 +
src/include/utils/backend_status.h | 144 +++++++++++-
src/test/regress/expected/rules.out | 27 +++
src/test/regress/expected/stats.out | 36 +++
src/test/regress/sql/stats.sql | 20 ++
17 files changed, 869 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index df5242fa80..cfc221fb2e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5757,6 +5757,252 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</sect2>
+ <sect2 id="monitoring-pg-stat-memory-allocation-view">
+ <title><structname>pg_stat_memory_allocation</structname></title>
+
+ <indexterm>
+ <primary>pg_stat_memory_allocation</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_stat_memory_allocation</structname> view will have one
+ row per server process, showing information related to the current memory
+ allocation of that process in total and by allocator type. Due to the
+ dynamic nature of memory allocations the allocated bytes values may not be
+ exact but should be sufficient for the intended purposes. Dynamic shared
+ memory allocations are included only in the value displayed for the backend
+ that created them, they are not included in the value for backends that are
+ attached to them to avoid double counting. Use
+ <function>pg_size_pretty</function> described in
+ <xref linkend="functions-admin-dbsize"/> to make these values more easily
+ readable.
+ </para>
+
+ <table id="pg-stat-memory-allocation-view" xreflabel="pg_stat_memory_allocation">
+ <title><structname>pg_stat_memory_allocation</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of this backend
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes. This is the balance
+ of bytes allocated and freed by this backend. Dynamic shared memory
+ allocations are included only in the value displayed for the backend that
+ created them, they are not included in the value for backends that are
+ attached to them to avoid double counting.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>aset_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the allocation
+ set allocator.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the dynamic
+ shared memory allocator. Upon process exit, dsm allocations that have
+ not been freed are considered long lived and added to
+ <structfield>global_dsm_allocated_bytes</structfield> found in the
+ <link linkend="monitoring-pg-stat-global-memory-allocation-view">
+ <structname>pg_stat_global_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>generation_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the generation
+ allocator.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>slab_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Memory currently allocated to this backend in bytes via the slab
+ allocator.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
+ <sect2 id="monitoring-pg-stat-global-memory-allocation-view">
+ <title><structname>pg_stat_global_memory_allocation</structname></title>
+
+ <indexterm>
+ <primary>pg_stat_global-memory_allocation</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_stat_global_memory_allocation</structname> view will
+ have one row showing information related to current shared memory
+ allocations. Due to the dynamic nature of memory allocations the allocated
+ bytes values may not be exact but should be sufficient for the intended
+ purposes. Use <function>pg_size_pretty</function> described in
+ <xref linkend="functions-admin-dbsize"/> to make the byte populated values
+ more easily readable.
+ </para>
+
+ <table id="pg-stat-global-memory-allocation-view" xreflabel="pg_stat_global_memory_allocation">
+ <title><structname>pg_stat_global_memory_allocation</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database this backend is connected to
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>shared_memory_size_mb</structfield> <type>integer</type>
+ </para>
+ <para>
+ Reports the size of the main shared memory area, rounded up to the
+ nearest megabyte. See <xref linkend="guc-shared-memory-size"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>shared_memory_size_in_huge_pages</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Reports the number of huge pages that are needed for the main shared
+ memory area based on the specified huge_page_size. If huge pages are not
+ supported, this will be -1. See
+ <xref linkend="guc-shared-memory-size-in-huge-pages"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Long lived dynamically allocated memory currently allocated to the
+ database. Upon process exit, dsm allocations that have not been freed
+ are considered long lived and added to
+ <structfield>global_dsm_allocated_bytes</structfield>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_aset_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>aset_allocated_bytes</structfield> for all
+ backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_dsm_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>dsm_allocated_bytes</structfield> for all
+ backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_generation_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>generation_allocated_bytes</structfield> for
+ all backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_slab_allocated_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Sum total of <structfield>slab_allocated_bytes</structfield> for all
+ backend processes from
+ <link linkend="monitoring-pg-stat-memory-allocation-view">
+ <structname>pg_stat_memory_allocation</structname></link> view.
+ </para></entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
<sect2 id="monitoring-stats-functions">
<title>Statistics Functions</title>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c18fea8362..cc8219c665 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1341,3 +1341,37 @@ CREATE VIEW pg_stat_subscription_stats AS
ss.stats_reset
FROM pg_subscription as s,
pg_stat_get_subscription_stats(s.oid) as ss;
+
+CREATE VIEW pg_stat_memory_allocation AS
+ SELECT
+ S.datid AS datid,
+ S.pid,
+ S.allocated_bytes,
+ S.aset_allocated_bytes,
+ S.dsm_allocated_bytes,
+ S.generation_allocated_bytes,
+ S.slab_allocated_bytes
+ FROM pg_stat_get_memory_allocation(NULL) AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
+
+CREATE VIEW pg_stat_global_memory_allocation AS
+WITH sums AS (
+ SELECT
+ SUM(aset_allocated_bytes) AS total_aset_allocated_bytes,
+ SUM(dsm_allocated_bytes) AS total_dsm_allocated_bytes,
+ SUM(generation_allocated_bytes) AS total_generation_allocated_bytes,
+ SUM(slab_allocated_bytes) AS total_slab_allocated_bytes
+ FROM
+ pg_stat_memory_allocation
+)
+SELECT
+ S.datid AS datid,
+ current_setting('shared_memory_size', true) as shared_memory_size,
+ (current_setting('shared_memory_size_in_huge_pages', true))::integer as shared_memory_size_in_huge_pages,
+ S.global_dsm_allocated_bytes,
+ sums.total_aset_allocated_bytes,
+ sums.total_dsm_allocated_bytes,
+ sums.total_generation_allocated_bytes,
+ sums.total_slab_allocated_bytes
+ FROM sums, pg_stat_get_global_memory_allocation() AS S
+ LEFT JOIN pg_database AS D ON (S.datid = D.oid);
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 10b029bb16..64b1fecd1c 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -775,6 +775,15 @@ dsm_detach_all(void)
void
dsm_detach(dsm_segment *seg)
{
+ /*
+ * Retain mapped_size to pass into destroy call in cases where the detach
+ * is the last reference. mapped_size is zeroed as part of the detach
+ * process, but is needed later in these cases for dsm_allocated_bytes
+ * accounting.
+ */
+ Size local_seg_mapped_size = seg->mapped_size;
+ Size *ptr_local_seg_mapped_size = &local_seg_mapped_size;
+
/*
* Invoke registered callbacks. Just in case one of those callbacks
* throws a further error that brings us back here, pop the callback
@@ -855,7 +864,7 @@ dsm_detach(dsm_segment *seg)
*/
if (is_main_region_dsm_handle(seg->handle) ||
dsm_impl_op(DSM_OP_DESTROY, seg->handle, 0, &seg->impl_private,
- &seg->mapped_address, &seg->mapped_size, WARNING))
+ &seg->mapped_address, ptr_local_seg_mapped_size, WARNING))
{
LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
if (is_main_region_dsm_handle(seg->handle))
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 6399fa2ad5..f43bad4439 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
#include "postmaster/postmaster.h"
#include "storage/dsm_impl.h"
#include "storage/fd.h"
+#include "utils/backend_status.h"
#include "utils/guc.h"
#include "utils/memutils.h"
@@ -232,6 +233,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +341,33 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ {
+ /*
+ * Posix creation calls dsm_impl_posix_resize implying that resizing
+ * occurs or may be added in the future. As implemented
+ * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+ * whole new size as input, growing the allocation as needed (only
+ * truncate supports shrinking). We update by replacing the old
+ * allocation with the new.
+ */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ /*
+ * posix_fallocate does not shrink allocations, adjust only on
+ * allocation increase.
+ */
+ if (request_size > *mapped_size)
+ pgstat_report_allocated_bytes_increase(request_size - *mapped_size, PG_ALLOC_DSM);
+#else
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
+#endif
+ }
*mapped_address = address;
*mapped_size = request_size;
close(fd);
@@ -538,6 +574,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -585,6 +629,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
@@ -653,6 +704,13 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ if (op == DSM_OP_DESTROY)
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*impl_private = NULL;
*mapped_address = NULL;
*mapped_size = 0;
@@ -769,6 +827,12 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return false;
}
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(info.RegionSize, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = info.RegionSize;
*impl_private = hmap;
@@ -813,6 +877,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Detach and destroy pass through here, only decrease the memory
+ * shown allocated in pg_stat_activity when the creator destroys the
+ * allocation.
+ */
+ pgstat_report_allocated_bytes_decrease(*mapped_size, PG_ALLOC_DSM);
*mapped_address = NULL;
*mapped_size = 0;
if (op == DSM_OP_DESTROY && unlink(name) != 0)
@@ -934,6 +1005,13 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
name)));
return false;
}
+
+ /*
+ * Attach and create pass through here, only update backend memory
+ * allocated in pg_stat_activity for the creator process.
+ */
+ if (op == DSM_OP_CREATE)
+ pgstat_report_allocated_bytes_increase(request_size, PG_ALLOC_DSM);
*mapped_address = address;
*mapped_size = request_size;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index dac921219f..d798c05180 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -180,6 +180,7 @@ InitProcGlobal(void)
ProcGlobal->checkpointerLatch = NULL;
pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO);
pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
+ pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0);
/*
* Create and initialize all the PGPROC structures we'll need. There are
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 38f91a495b..50b36ba5f7 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -49,6 +49,24 @@ int pgstat_track_activity_query_size = 1024;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
+/*
+ * Memory allocated to this backend prior to pgstats initialization. Migrated to
+ * shared memory on pgstats initialization.
+ */
+uint64 local_my_allocated_bytes = 0;
+uint64 *my_allocated_bytes = &local_my_allocated_bytes;
+
+/* Memory allocated to this backend by type prior to pgstats initialization.
+ * Migrated to shared memory on pgstats initialization
+ */
+uint64 local_my_aset_allocated_bytes = 0;
+uint64 *my_aset_allocated_bytes = &local_my_aset_allocated_bytes;
+uint64 local_my_dsm_allocated_bytes = 0;
+uint64 *my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
+uint64 local_my_generation_allocated_bytes = 0;
+uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
+uint64 local_my_slab_allocated_bytes = 0;
+uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
@@ -401,6 +419,32 @@ pgstat_bestart(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = UINT64CONST(0);
+ /* Alter allocation reporting from local storage to shared memory */
+ pgstat_set_allocated_bytes_storage(&MyBEEntry->allocated_bytes,
+ &MyBEEntry->aset_allocated_bytes,
+ &MyBEEntry->dsm_allocated_bytes,
+ &MyBEEntry->generation_allocated_bytes,
+ &MyBEEntry->slab_allocated_bytes);
+
+ /*
+ * Populate sum of memory allocated prior to pgstats initialization to
+ * pgstats and zero the local variable. This is a += assignment because
+ * InitPostgres allocates memory after pgstat_beinit but prior to
+ * pgstat_bestart so we have allocations to both local and shared memory
+ * to combine.
+ */
+ lbeentry.allocated_bytes += local_my_allocated_bytes;
+ local_my_allocated_bytes = 0;
+ lbeentry.aset_allocated_bytes += local_my_aset_allocated_bytes;
+ local_my_aset_allocated_bytes = 0;
+
+ lbeentry.dsm_allocated_bytes += local_my_dsm_allocated_bytes;
+ local_my_dsm_allocated_bytes = 0;
+ lbeentry.generation_allocated_bytes += local_my_generation_allocated_bytes;
+ local_my_generation_allocated_bytes = 0;
+ lbeentry.slab_allocated_bytes += local_my_slab_allocated_bytes;
+ local_my_slab_allocated_bytes = 0;
+
/*
* we don't zero st_progress_param here to save cycles; nobody should
* examine it until st_progress_command has been set to something other
@@ -460,6 +504,9 @@ pgstat_beshutdown_hook(int code, Datum arg)
{
volatile PgBackendStatus *beentry = MyBEEntry;
+ /* Stop reporting memory allocation changes to shared memory */
+ pgstat_reset_allocated_bytes_storage();
+
/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
@@ -1195,3 +1242,70 @@ pgstat_clip_activity(const char *raw_activity)
return activity;
}
+
+/*
+ * Configure bytes allocated reporting to report allocated bytes to
+ * shared memory.
+ *
+ * Expected to be called during backend startup (in pgstat_bestart), to point
+ * allocated bytes accounting into shared memory.
+ */
+void
+pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
+ uint64 *aset_allocated_bytes,
+ uint64 *dsm_allocated_bytes,
+ uint64 *generation_allocated_bytes,
+ uint64 *slab_allocated_bytes)
+{
+ /* Map allocations to shared memory */
+ my_allocated_bytes = allocated_bytes;
+ *allocated_bytes = local_my_allocated_bytes;
+
+ my_aset_allocated_bytes = aset_allocated_bytes;
+ *aset_allocated_bytes = local_my_aset_allocated_bytes;
+
+ my_dsm_allocated_bytes = dsm_allocated_bytes;
+ *dsm_allocated_bytes = local_my_dsm_allocated_bytes;
+
+ my_generation_allocated_bytes = generation_allocated_bytes;
+ *generation_allocated_bytes = local_my_generation_allocated_bytes;
+
+ my_slab_allocated_bytes = slab_allocated_bytes;
+ *slab_allocated_bytes = local_my_slab_allocated_bytes;
+}
+
+/*
+ * Reset allocated bytes storage location.
+ *
+ * Expected to be called during backend shutdown, before the locations set up
+ * by pgstat_set_allocated_bytes_storage become invalid.
+ */
+void
+pgstat_reset_allocated_bytes_storage(void)
+{
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /*
+ * Add dsm allocations that have not been freed to global dsm
+ * accounting
+ */
+ pg_atomic_add_fetch_u64(&procglobal->global_dsm_allocation,
+ *my_dsm_allocated_bytes);
+ }
+
+ /* Reset memory allocation variables */
+ *my_allocated_bytes = local_my_allocated_bytes = 0;
+ *my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = local_my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = local_my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = local_my_slab_allocated_bytes = 0;
+
+ /* Point my_{*_}allocated_bytes from shared memory back to local */
+ my_allocated_bytes = &local_my_allocated_bytes;
+ my_aset_allocated_bytes = &local_my_aset_allocated_bytes;
+ my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
+ my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
+ my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 49adc319fc..35f2d2bffe 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2062,3 +2062,87 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
}
+
+/*
+ * Get the memory allocation of PG backends.
+ */
+Datum
+pg_stat_get_memory_allocation(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_MEMORY_ALLOCATION_COLS 7
+ int num_backends = pgstat_fetch_stat_numbackends();
+ int curr_backend;
+ int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /* 1-based index */
+ for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+ {
+ /* for each row */
+ Datum values[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0};
+ bool nulls[PG_STAT_GET_MEMORY_ALLOCATION_COLS] = {0};
+ LocalPgBackendStatus *local_beentry;
+ PgBackendStatus *beentry;
+
+ /* Get the next one in the list */
+ local_beentry = pgstat_fetch_stat_local_beentry(curr_backend);
+ beentry = &local_beentry->backendStatus;
+
+ /* If looking for specific PID, ignore all the others */
+ if (pid != -1 && beentry->st_procpid != pid)
+ continue;
+
+ /* Values available to all callers */
+ if (beentry->st_databaseid != InvalidOid)
+ values[0] = ObjectIdGetDatum(beentry->st_databaseid);
+ else
+ nulls[0] = true;
+
+ values[1] = Int32GetDatum(beentry->st_procpid);
+ values[2] = UInt64GetDatum(beentry->allocated_bytes);
+ values[3] = UInt64GetDatum(beentry->aset_allocated_bytes);
+ values[4] = UInt64GetDatum(beentry->dsm_allocated_bytes);
+ values[5] = UInt64GetDatum(beentry->generation_allocated_bytes);
+ values[6] = UInt64GetDatum(beentry->slab_allocated_bytes);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+
+ /* If only a single backend was requested, and we found it, break. */
+ if (pid != -1)
+ break;
+ }
+
+ return (Datum) 0;
+}
+
+/*
+ * Get the global memory allocation statistics.
+ */
+Datum
+pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2
+ TupleDesc tupdesc;
+ Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
+ bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Initialise attributes information in the tuple descriptor */
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid",
+ OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes",
+ INT8OID, -1, 0);
+ BlessTupleDesc(tupdesc);
+
+ /* datid */
+ values[0] = ObjectIdGetDatum(MyDatabaseId);
+
+ /* get global_dsm_allocated_bytes */
+ values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
+
+ /* Returns the record as Datum */
+ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
+}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a604432126..7b8eeb7dbb 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -171,6 +171,9 @@ InitPostmasterChild(void)
(errcode_for_socket_access(),
errmsg_internal("could not set postmaster death monitoring pipe to FD_CLOEXEC mode: %m")));
#endif
+
+ /* Init allocated bytes to avoid double counting parent allocation */
+ pgstat_init_allocated_bytes();
}
/*
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0bbbf93672..4146831d75 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -47,6 +47,7 @@
#include "postgres.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -513,6 +514,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_ASET);
return (MemoryContext) set;
}
@@ -535,6 +537,7 @@ AllocSetReset(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -577,6 +580,7 @@ AllocSetReset(MemoryContext context)
{
/* Normal case, release the block */
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -587,6 +591,7 @@ AllocSetReset(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
/* Reset block size allocation sequence, too */
set->nextBlockSize = set->initBlockSize;
@@ -605,6 +610,7 @@ AllocSetDelete(MemoryContext context)
AllocSet set = (AllocSet) context;
AllocBlock block = set->blocks;
Size keepersize PG_USED_FOR_ASSERTS_ONLY;
+ uint64 deallocation = 0;
Assert(AllocSetIsValid(set));
@@ -643,11 +649,13 @@ AllocSetDelete(MemoryContext context)
freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
freelist->num_free--;
+ deallocation += oldset->header.mem_allocated;
/* All that remains is to free the header/initial block */
free(oldset);
}
Assert(freelist->num_free == 0);
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_ASET);
}
/* Now add the just-deleted context to the freelist. */
@@ -664,7 +672,10 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (block != set->keeper)
+ {
context->mem_allocated -= block->endptr - ((char *) block);
+ deallocation += block->endptr - ((char *) block);
+ }
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -677,6 +688,7 @@ AllocSetDelete(MemoryContext context)
}
Assert(context->mem_allocated == keepersize);
+ pgstat_report_allocated_bytes_decrease(deallocation + context->mem_allocated, PG_ALLOC_ASET);
/* Finally, free the context header, including the keeper block */
free(set);
@@ -726,6 +738,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -939,6 +952,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1036,6 +1050,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;
set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_allocated_bytes_decrease(block->endptr - ((char *) block), PG_ALLOC_ASET);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1166,7 +1181,9 @@ AllocSetRealloc(void *pointer, Size size)
/* updated separately, not to underflow when (oldblksize > blksize) */
set->header.mem_allocated -= oldblksize;
+ pgstat_report_allocated_bytes_decrease(oldblksize, PG_ALLOC_ASET);
set->header.mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_ASET);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 4fb8663cd6..502f877855 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -37,6 +37,7 @@
#include "lib/ilist.h"
#include "port/pg_bitutils.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -259,6 +260,7 @@ GenerationContextCreate(MemoryContext parent,
name);
((MemoryContext) set)->mem_allocated = firstBlockSize;
+ pgstat_report_allocated_bytes_increase(firstBlockSize, PG_ALLOC_GENERATION);
return (MemoryContext) set;
}
@@ -275,6 +277,7 @@ GenerationReset(MemoryContext context)
{
GenerationContext *set = (GenerationContext *) context;
dlist_mutable_iter miter;
+ uint64 deallocation = 0;
Assert(GenerationIsValid(set));
@@ -297,9 +300,14 @@ GenerationReset(MemoryContext context)
if (block == set->keeper)
GenerationBlockMarkEmpty(block);
else
+ {
+ deallocation += block->blksize;
GenerationBlockFree(set, block);
+ }
}
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_GENERATION);
+
/* set it so new allocations to make use of the keeper block */
set->block = set->keeper;
@@ -320,6 +328,9 @@ GenerationDelete(MemoryContext context)
{
/* Reset to release all releasable GenerationBlocks */
GenerationReset(context);
+
+ pgstat_report_allocated_bytes_decrease(context->mem_allocated, PG_ALLOC_GENERATION);
+
/* And free the context header and keeper block */
free(context);
}
@@ -366,6 +377,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* block with a single (used) chunk */
block->context = set;
@@ -469,6 +481,7 @@ GenerationAlloc(MemoryContext context, Size size)
return NULL;
context->mem_allocated += blksize;
+ pgstat_report_allocated_bytes_increase(blksize, PG_ALLOC_GENERATION);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -721,6 +734,8 @@ GenerationFree(void *pointer)
dlist_delete(&block->node);
set->header.mem_allocated -= block->blksize;
+ pgstat_report_allocated_bytes_decrease(block->blksize, PG_ALLOC_GENERATION);
+
free(block);
}
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 718dd2ba03..913787dba8 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -69,6 +69,7 @@
#include "postgres.h"
#include "lib/ilist.h"
+#include "utils/backend_status.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
#include "utils/memutils_memorychunk.h"
@@ -413,6 +414,13 @@ SlabContextCreate(MemoryContext parent,
parent,
name);
+ /*
+ * If SlabContextCreate is updated to add context header size to
+ * context->mem_allocated, then update here and SlabDelete appropriately
+ */
+ pgstat_report_allocated_bytes_increase(Slab_CONTEXT_HDRSZ(slab->chunksPerBlock),
+ PG_ALLOC_SLAB);
+
return (MemoryContext) slab;
}
@@ -429,6 +437,7 @@ SlabReset(MemoryContext context)
SlabContext *slab = (SlabContext *) context;
dlist_mutable_iter miter;
int i;
+ uint64 deallocation = 0;
Assert(SlabIsValid(slab));
@@ -449,6 +458,7 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
/* walk over blocklist and free the blocks */
@@ -465,9 +475,11 @@ SlabReset(MemoryContext context)
#endif
free(block);
context->mem_allocated -= slab->blockSize;
+ deallocation += slab->blockSize;
}
}
+ pgstat_report_allocated_bytes_decrease(deallocation, PG_ALLOC_SLAB);
slab->curBlocklistIndex = 0;
Assert(context->mem_allocated == 0);
@@ -482,6 +494,14 @@ SlabDelete(MemoryContext context)
{
/* Reset to release all the SlabBlocks */
SlabReset(context);
+
+ /*
+ * Until context header allocation is included in context->mem_allocated,
+ * cast to slab and decrement the header allocation
+ */
+ pgstat_report_allocated_bytes_decrease(Slab_CONTEXT_HDRSZ(((SlabContext *) context)->chunksPerBlock),
+ PG_ALLOC_SLAB);
+
/* And free the context header */
free(context);
}
@@ -546,6 +566,7 @@ SlabAlloc(MemoryContext context, Size size)
block->slab = slab;
context->mem_allocated += slab->blockSize;
+ pgstat_report_allocated_bytes_increase(slab->blockSize, PG_ALLOC_SLAB);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -740,6 +761,7 @@ SlabFree(void *pointer)
#endif
free(block);
slab->header.mem_allocated -= slab->blockSize;
+ pgstat_report_allocated_bytes_decrease(slab->blockSize, PG_ALLOC_SLAB);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..b095740f8a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5408,6 +5408,23 @@
proname => 'pg_stat_get_backend_idset', prorows => '100', proretset => 't',
provolatile => 's', proparallel => 'r', prorettype => 'int4',
proargtypes => '', prosrc => 'pg_stat_get_backend_idset' },
+{ oid => '9890',
+ descr => 'statistics: memory allocation information for backends',
+ proname => 'pg_stat_get_memory_allocation', prorows => '100', proisstrict => 'f',
+ proretset => 't', provolatile => 's', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,oid,int4,int8,int8,int8,int8,int8}',
+ proargmodes => '{i,o,o,o,o,o,o,o}',
+ proargnames => '{pid,datid,pid,allocated_bytes,aset_allocated_bytes,dsm_allocated_bytes,generation_allocated_bytes,slab_allocated_bytes}',
+ prosrc => 'pg_stat_get_memory_allocation' },
+{ oid => '9891',
+ descr => 'statistics: global memory allocation information',
+ proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f',
+ provolatile => 's', proparallel => 'r', prorettype => 'record',
+ proargtypes => '', proallargtypes => '{oid,int8}',
+ proargmodes => '{o,o}',
+ proargnames => '{datid,global_dsm_allocated_bytes}',
+ prosrc =>'pg_stat_get_global_memory_allocation' },
{ oid => '2022',
descr => 'statistics: information about currently active backends',
proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index ef74f32693..e6be67de2a 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -404,6 +404,8 @@ typedef struct PROC_HDR
int spins_per_delay;
/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
int startupBufferPinWaitBufId;
+ /* Global dsm allocations */
+ pg_atomic_uint64 global_dsm_allocation;
} PROC_HDR;
extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 16500d53b2..9b75fc5223 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -10,6 +10,7 @@
#ifndef BACKEND_STATUS_H
#define BACKEND_STATUS_H
+#include "common/int.h"
#include "datatype/timestamp.h"
#include "libpq/pqcomm.h"
#include "miscadmin.h" /* for BackendType */
@@ -32,6 +33,14 @@ typedef enum BackendState
STATE_DISABLED
} BackendState;
+/* Enum helper for reporting memory allocator type */
+enum pg_allocator_type
+{
+ PG_ALLOC_ASET = 1,
+ PG_ALLOC_DSM,
+ PG_ALLOC_GENERATION,
+ PG_ALLOC_SLAB
+};
/* ----------
* Shared-memory data structures
@@ -170,6 +179,15 @@ typedef struct PgBackendStatus
/* query identifier, optionally computed using post_parse_analyze_hook */
uint64 st_query_id;
+
+ /* Current memory allocated to this backend */
+ uint64 allocated_bytes;
+
+ /* Current memory allocated to this backend by type */
+ uint64 aset_allocated_bytes;
+ uint64 dsm_allocated_bytes;
+ uint64 generation_allocated_bytes;
+ uint64 slab_allocated_bytes;
} PgBackendStatus;
@@ -294,6 +312,11 @@ extern PGDLLIMPORT int pgstat_track_activity_query_size;
* ----------
*/
extern PGDLLIMPORT PgBackendStatus *MyBEEntry;
+extern PGDLLIMPORT uint64 *my_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_aset_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_generation_allocated_bytes;
+extern PGDLLIMPORT uint64 *my_slab_allocated_bytes;
/* ----------
@@ -325,7 +348,12 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
int buflen);
extern uint64 pgstat_get_my_query_id(void);
-
+extern void pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
+ uint64 *aset_allocated_bytes,
+ uint64 *dsm_allocated_bytes,
+ uint64 *generation_allocated_bytes,
+ uint64 *slab_allocated_bytes);
+extern void pgstat_reset_allocated_bytes_storage(void);
/* ----------
* Support functions for the SQL-callable functions to
@@ -337,5 +365,119 @@ extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+/* ----------
+ * pgstat_report_allocated_bytes_decrease() -
+ * Called to report decrease in memory allocated for this backend.
+ *
+ * my_{*_}allocated_bytes initially points to local memory, making it safe to
+ * call this before pgstats has been initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
+ int pg_allocator_type)
+{
+ uint64 temp;
+
+ /* Avoid allocated_bytes unsigned integer overflow on decrease */
+ if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp))
+ {
+ /* On overflow, set allocated bytes and allocator type bytes to zero */
+ *my_allocated_bytes = 0;
+ *my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = 0;
+ }
+ else
+ {
+ /* decrease allocation */
+ *my_allocated_bytes -= proc_allocated_bytes;
+
+ /* Decrease allocator type allocated bytes. */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_DSM:
+
+ /*
+ * Some dsm allocations live beyond process exit. These are
+ * accounted for in a global counter in
+ * pgstat_reset_allocated_bytes_storage at process exit.
+ */
+ *my_dsm_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes -= proc_allocated_bytes;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes -= proc_allocated_bytes;
+ break;
+ }
+ }
+
+ return;
+}
+
+/* ----------
+ * pgstat_report_allocated_bytes_increase() -
+ * Called to report increase in memory allocated for this backend.
+ *
+ * my_allocated_bytes initially points to local memory, making it safe to call
+ * this before pgstats has been initialized.
+ * ----------
+ */
+static inline void
+pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
+ int pg_allocator_type)
+{
+ *my_allocated_bytes += proc_allocated_bytes;
+
+ /* Increase allocator type allocated bytes */
+ switch (pg_allocator_type)
+ {
+ case PG_ALLOC_ASET:
+ *my_aset_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_DSM:
+
+ /*
+ * Some dsm allocations live beyond process exit. These are
+ * accounted for in a global counter in
+ * pgstat_reset_allocated_bytes_storage at process exit.
+ */
+ *my_dsm_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_GENERATION:
+ *my_generation_allocated_bytes += proc_allocated_bytes;
+ break;
+ case PG_ALLOC_SLAB:
+ *my_slab_allocated_bytes += proc_allocated_bytes;
+ break;
+ }
+
+ return;
+}
+
+/* ---------
+ * pgstat_init_allocated_bytes() -
+ *
+ * Called to initialize allocated bytes variables after fork and to
+ * avoid double counting allocations.
+ * ---------
+ */
+static inline void
+pgstat_init_allocated_bytes(void)
+{
+ *my_allocated_bytes = 0;
+ *my_aset_allocated_bytes = 0;
+ *my_dsm_allocated_bytes = 0;
+ *my_generation_allocated_bytes = 0;
+ *my_slab_allocated_bytes = 0;
+
+ return;
+}
#endif /* BACKEND_STATUS_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7fd81e6a7d..ff76aa99a2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1873,6 +1873,24 @@ pg_stat_database_conflicts| SELECT oid AS datid,
pg_stat_get_db_conflict_startup_deadlock(oid) AS confl_deadlock,
pg_stat_get_db_conflict_logicalslot(oid) AS confl_active_logicalslot
FROM pg_database d;
+pg_stat_global_memory_allocation| WITH sums AS (
+ SELECT sum(pg_stat_memory_allocation.aset_allocated_bytes) AS total_aset_allocated_bytes,
+ sum(pg_stat_memory_allocation.dsm_allocated_bytes) AS total_dsm_allocated_bytes,
+ sum(pg_stat_memory_allocation.generation_allocated_bytes) AS total_generation_allocated_bytes,
+ sum(pg_stat_memory_allocation.slab_allocated_bytes) AS total_slab_allocated_bytes
+ FROM pg_stat_memory_allocation
+ )
+ SELECT s.datid,
+ current_setting('shared_memory_size'::text, true) AS shared_memory_size,
+ (current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ s.global_dsm_allocated_bytes,
+ sums.total_aset_allocated_bytes,
+ sums.total_dsm_allocated_bytes,
+ sums.total_generation_allocated_bytes,
+ sums.total_slab_allocated_bytes
+ FROM sums,
+ (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
gss_princ AS principal,
@@ -1899,6 +1917,15 @@ pg_stat_io| SELECT backend_type,
fsync_time,
stats_reset
FROM pg_stat_get_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
+pg_stat_memory_allocation| SELECT s.datid,
+ s.pid,
+ s.allocated_bytes,
+ s.aset_allocated_bytes,
+ s.dsm_allocated_bytes,
+ s.generation_allocated_bytes,
+ s.slab_allocated_bytes
+ FROM (pg_stat_get_memory_allocation(NULL::integer) s(datid, pid, allocated_bytes, aset_allocated_bytes, dsm_allocated_bytes, generation_allocated_bytes, slab_allocated_bytes)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_analyze| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 8e63340782..748c337ee7 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1572,4 +1572,40 @@ SELECT COUNT(*) FROM brin_hot_3 WHERE a = 2;
DROP TABLE brin_hot_3;
SET enable_seqscan = on;
+-- ensure that allocated_bytes exist for backends
+SELECT
+ allocated_bytes > 0 AS result
+FROM
+ pg_stat_activity ps
+ JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid)
+WHERE
+ backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+ result
+--------
+ t
+ t
+ t
+ t
+(4 rows)
+
+-- ensure that pg_stat_global_memory_allocation view exists
+SELECT
+ datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0
+FROM
+ pg_stat_global_memory_allocation;
+ ?column? | ?column? | ?column? | ?column?
+----------+----------+----------+----------
+ t | t | t | t
+(1 row)
+
+-- ensure that pg_stat_memory_allocation view exists
+SELECT
+ pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0
+FROM
+ pg_stat_memory_allocation limit 1;
+ ?column? | ?column? | ?column? | ?column? | ?column? | ?column?
+----------+----------+----------+----------+----------+----------
+ t | t | t | t | t | t
+(1 row)
+
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index fddf5a8277..a01f2545ba 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -805,4 +805,24 @@ DROP TABLE brin_hot_3;
SET enable_seqscan = on;
+-- ensure that allocated_bytes exist for backends
+SELECT
+ allocated_bytes > 0 AS result
+FROM
+ pg_stat_activity ps
+ JOIN pg_stat_memory_allocation pa ON (pa.pid = ps.pid)
+WHERE
+ backend_type IN ('checkpointer', 'background writer', 'walwriter', 'autovacuum launcher');
+
+-- ensure that pg_stat_global_memory_allocation view exists
+SELECT
+ datid > 0, pg_size_bytes(shared_memory_size) >= 0, shared_memory_size_in_huge_pages >= -1, global_dsm_allocated_bytes >= 0
+FROM
+ pg_stat_global_memory_allocation;
+
+-- ensure that pg_stat_memory_allocation view exists
+SELECT
+ pid > 0, allocated_bytes >= 0, aset_allocated_bytes >= 0, dsm_allocated_bytes >= 0, generation_allocated_bytes >= 0, slab_allocated_bytes >= 0
+FROM
+ pg_stat_memory_allocation limit 1;
-- End of Stats Test
--
2.25.1
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchtext/x-patch; charset=UTF-8; name=0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patchDownload
From f5a7a2ccfd975cb29b47af4fff563947085e1748 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
allocated to backends.
This builds on the work that adds backend memory allocated tracking.
Add GUC variable max_total_backend_memory.
Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
exhaust max_total_backend_memory memory will be denied with an out of memory
error causing that backend's current query/transaction to fail. Further
requests will not be allocated until dropping below the limit. Keep this in
mind when setting this value. Due to the dynamic nature of memory allocations,
this limit is not exact. This limit does not affect auxiliary backend
processes. Backend memory allocations are displayed in the
pg_stat_memory_allocation and pg_stat_global_memory_allocation views.
---
doc/src/sgml/config.sgml | 30 ++++
doc/src/sgml/monitoring.sgml | 38 ++++-
src/backend/catalog/system_views.sql | 2 +
src/backend/port/sysv_shmem.c | 9 ++
src/backend/postmaster/postmaster.c | 5 +
src/backend/storage/ipc/dsm_impl.c | 18 +++
src/backend/storage/lmgr/proc.c | 45 ++++++
src/backend/utils/activity/backend_status.c | 147 ++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 16 +-
src/backend/utils/hash/dynahash.c | 3 +-
src/backend/utils/init/miscinit.c | 8 +
src/backend/utils/misc/guc_tables.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 3 +
src/backend/utils/mmgr/aset.c | 33 ++++
src/backend/utils/mmgr/generation.c | 16 ++
src/backend/utils/mmgr/slab.c | 15 +-
src/include/catalog/pg_proc.dat | 6 +-
src/include/storage/proc.h | 7 +
src/include/utils/backend_status.h | 102 +++++++++++-
src/test/regress/expected/rules.out | 4 +-
20 files changed, 498 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5da74b3c40..397661d4b2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2180,6 +2180,36 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-max-total-backend-memory" xreflabel="max_total_backend_memory">
+ <term><varname>max_total_backend_memory</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>max_total_backend_memory</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a limit to the amount of memory (MB) that may be allocated to
+ backends in total (i.e. this is not a per user or per backend limit).
+ If unset, or set to 0 it is disabled. At databse startup
+ max_total_backend_memory is reduced by shared_memory_size_mb
+ (includes shared buffers and other memory required for initialization).
+ Each backend process is intialized with a 1MB local allowance which
+ also reduces total_bkend_mem_bytes_available. Keep this in mind when
+ setting this value. A backend request that would exhaust the limit will
+ be denied with an out of memory error causing that backend's current
+ query/transaction to fail. Further requests will not be allocated until
+ dropping below the limit. This limit does not affect auxiliary backend
+ processes
+ <xref linkend="glossary-auxiliary-proc"/> or the postmaster process.
+ Backend memory allocations (<varname>allocated_bytes</varname>) are
+ displayed in the
+ <link linkend="monitoring-pg-stat-memory-allocation-view"><structname>pg_stat_memory_allocation</structname></link>
+ view. Due to the dynamic nature of memory allocations, this limit is
+ not exact.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index cfc221fb2e..aa53e0be3e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5817,10 +5817,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</para>
<para>
Memory currently allocated to this backend in bytes. This is the balance
- of bytes allocated and freed by this backend. Dynamic shared memory
- allocations are included only in the value displayed for the backend that
- created them, they are not included in the value for backends that are
- attached to them to avoid double counting.
+ of bytes allocated and freed by this backend.
</para></entry>
</row>
@@ -5937,6 +5934,39 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>max_total_backend_memory_bytes</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Reports the user defined backend maximum allowed shared memory in bytes.
+ 0 if disabled or not set. See
+ <xref linkend="guc-max-total-backend-memory"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>total_bkend_mem_bytes_available</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Tracks max_total_backend_memory (in bytes) available for allocation. At
+ database startup, total_bkend_mem_bytes_available is reduced by the
+ byte equivalent of shared_memory_size_mb. Each backend process is
+ intialized with a 1MB local allowance which also reduces
+ total_bkend_mem_bytes_available. A process's allocation requests reduce
+ it's local allowance. If a process's allocation request exceeds it's
+ remaining allowance, an attempt is made to refill the local allowance
+ from total_bkend_mem_bytes_available. If the refill request fails, then
+ the requesting process will fail with an out of memory error resulting
+ in the cancellation of that process's active query/transaction. The
+ default refill allocation quantity is 1MB. If a request is greater than
+ 1MB, an attempt will be made to allocate the full amount. If
+ max_total_backend_memory is disabled, this will be -1.
+ <xref linkend="guc-max-total-backend-memory"/>.
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>global_dsm_allocated_bytes</structfield> <type>bigint</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index cc8219c665..0832027727 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1368,6 +1368,8 @@ SELECT
S.datid AS datid,
current_setting('shared_memory_size', true) as shared_memory_size,
(current_setting('shared_memory_size_in_huge_pages', true))::integer as shared_memory_size_in_huge_pages,
+ pg_size_bytes(current_setting('max_total_backend_memory', true)) as max_total_backend_memory_bytes,
+ S.total_bkend_mem_bytes_available,
S.global_dsm_allocated_bytes,
sums.total_aset_allocated_bytes,
sums.total_dsm_allocated_bytes,
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index eaba244bc9..463bf2e90f 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -34,6 +34,7 @@
#include "storage/fd.h"
#include "storage/ipc.h"
#include "storage/pg_shmem.h"
+#include "utils/backend_status.h"
#include "utils/guc_hooks.h"
#include "utils/pidfile.h"
@@ -903,6 +904,14 @@ PGSharedMemoryReAttach(void)
dsm_set_control_handle(hdr->dsm_control);
UsedShmemSegAddr = hdr; /* probably redundant */
+
+ /*
+ * Init allocated bytes to avoid double counting parent allocation for
+ * fork/exec processes. Forked processes perform this action in
+ * InitPostmasterChild. For EXEC_BACKEND processes we have to wait for
+ * shared memory to be reattached.
+ */
+ pgstat_init_allocated_bytes();
}
/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 4c49393fc5..06a773c8bb 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -540,6 +540,7 @@ typedef struct
#endif
char my_exec_path[MAXPGPATH];
char pkglib_path[MAXPGPATH];
+ int max_total_bkend_mem;
} BackendParameters;
static void read_backend_variables(char *id, Port *port);
@@ -6122,6 +6123,8 @@ save_backend_variables(BackendParameters *param, Port *port,
strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
+ param->max_total_bkend_mem = max_total_bkend_mem;
+
return true;
}
@@ -6352,6 +6355,8 @@ restore_backend_variables(BackendParameters *param, Port *port)
strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
+ max_total_bkend_mem = param->max_total_bkend_mem;
+
/*
* We need to restore fd.c's counts of externally-opened FDs; to avoid
* confusion, be sure to do this after restoring max_safe_fds. (Note:
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index f43bad4439..41ffe48aa3 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,16 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ {
+ ereport(elevel,
+ (errcode_for_dynamic_shared_memory(),
+ errmsg("out of memory for segment \"%s\" - exceeds max_total_backend_memory: %m",
+ name)));
+ return false;
+ }
+
/*
* Create new segment or open an existing one for attach.
*
@@ -523,6 +533,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
int flags = IPCProtection;
size_t segsize;
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/*
* Allocate the memory BEFORE acquiring the resource, so that we don't
* leak the resource if memory allocation fails.
@@ -717,6 +731,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
return true;
}
+ /* Do not exceed maximum allowed memory allocation */
+ if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+ return false;
+
/* Create new segment or open an existing one for attach. */
if (op == DSM_OP_CREATE)
{
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index d798c05180..8493ca1dbf 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/guc.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
@@ -182,6 +183,50 @@ InitProcGlobal(void)
pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
pg_atomic_init_u64(&ProcGlobal->global_dsm_allocation, 0);
+ /* Setup backend memory limiting if configured */
+ if (max_total_bkend_mem > 0)
+ {
+ /*
+ * Convert max_total_bkend_mem to bytes, account for
+ * shared_memory_size, and initialize total_bkend_mem_bytes.
+ */
+ int result = 0;
+
+ /* Get integer value of shared_memory_size */
+ if (parse_int(GetConfigOption("shared_memory_size", true, false), &result, 0, NULL))
+ {
+ /*
+ * Error on startup if backend memory limit is less than shared
+ * memory size. Warn on startup if backend memory available is
+ * less than arbitrarily picked value of 100MB.
+ */
+
+ if (max_total_bkend_mem - result <= 0)
+ {
+ ereport(ERROR,
+ errmsg("configured max_total_backend_memory %dMB is <= shared_memory_size %dMB",
+ max_total_bkend_mem, result),
+ errhint("Disable or increase the configuration parameter \"max_total_backend_memory\"."));
+ }
+ else if (max_total_bkend_mem - result <= 100)
+ {
+ ereport(WARNING,
+ errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <= 100MB",
+ max_total_bkend_mem, result),
+ errhint("Consider increasing the configuration parameter \"max_total_backend_memory\"."));
+ }
+
+ /*
+ * Account for shared memory size and initialize
+ * total_bkend_mem_bytes.
+ */
+ pg_atomic_init_u64(&ProcGlobal->total_bkend_mem_bytes,
+ (uint64) max_total_bkend_mem * 1024 * 1024 - (uint64) result * 1024 * 1024);
+ }
+ else
+ ereport(ERROR, errmsg("max_total_backend_memory initialization is unable to parse shared_memory_size"));
+ }
+
/*
* Create and initialize all the PGPROC structures we'll need. There are
* five separate consumers: (1) normal backends, (2) autovacuum workers
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 50b36ba5f7..cad50b2a06 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -45,6 +45,12 @@
bool pgstat_track_activities = false;
int pgstat_track_activity_query_size = 1024;
+/*
+ * Max backend memory allocation allowed (MB). 0 = disabled.
+ * Centralized bucket ProcGlobal->max_total_bkend_mem is initialized
+ * as a byte representation of this value in InitProcGlobal().
+ */
+int max_total_bkend_mem = 0;
/* exposed so that backend_progress.c can access it */
PgBackendStatus *MyBEEntry = NULL;
@@ -68,6 +74,31 @@ uint64 *my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
uint64 local_my_slab_allocated_bytes = 0;
uint64 *my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+/*
+ * Define initial allocation allowance for a backend.
+ *
+ * NOTE: initial_allocation_allowance && allocation_allowance_refill_qty
+ * may be candidates for future GUC variables. Arbitrary 1MB selected initially.
+ */
+uint64 initial_allocation_allowance = 1024 * 1024;
+uint64 allocation_allowance_refill_qty = 1024 * 1024;
+
+/*
+ * Local counter to manage shared memory allocations. At backend startup, set to
+ * initial_allocation_allowance via pgstat_init_allocated_bytes(). Decrease as
+ * memory is malloc'd. When exhausted, atomically refill if available from
+ * ProcGlobal->max_total_bkend_mem via exceeds_max_total_bkend_mem().
+ */
+uint64 allocation_allowance = 0;
+
+/*
+ * Local counter of free'd shared memory. Return to global
+ * max_total_bkend_mem when return threshold is met. Arbitrary 1MB bytes
+ * selected initially.
+ */
+uint64 allocation_return = 0;
+uint64 allocation_return_threshold = 1024 * 1024;
+
static PgBackendStatus *BackendStatusArray = NULL;
static char *BackendAppnameBuffer = NULL;
static char *BackendClientHostnameBuffer = NULL;
@@ -1272,6 +1303,8 @@ pgstat_set_allocated_bytes_storage(uint64 *allocated_bytes,
my_slab_allocated_bytes = slab_allocated_bytes;
*slab_allocated_bytes = local_my_slab_allocated_bytes;
+
+ return;
}
/*
@@ -1295,6 +1328,23 @@ pgstat_reset_allocated_bytes_storage(void)
*my_dsm_allocated_bytes);
}
+ /*
+ * When limiting maximum backend memory, return this backend's memory
+ * allocations to global.
+ */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ pg_atomic_add_fetch_u64(&procglobal->total_bkend_mem_bytes,
+ *my_allocated_bytes + allocation_allowance +
+ allocation_return);
+
+ /* Reset memory allocation variables */
+ allocation_allowance = 0;
+ allocation_return = 0;
+ }
+
/* Reset memory allocation variables */
*my_allocated_bytes = local_my_allocated_bytes = 0;
*my_aset_allocated_bytes = local_my_aset_allocated_bytes = 0;
@@ -1308,4 +1358,101 @@ pgstat_reset_allocated_bytes_storage(void)
my_dsm_allocated_bytes = &local_my_dsm_allocated_bytes;
my_generation_allocated_bytes = &local_my_generation_allocated_bytes;
my_slab_allocated_bytes = &local_my_slab_allocated_bytes;
+
+ return;
+}
+
+/*
+ * Determine if allocation request will exceed max backend memory allowed.
+ * Do not apply to auxiliary processes.
+ * Refill allocation request bucket when needed/possible.
+ */
+bool
+exceeds_max_total_bkend_mem(uint64 allocation_request)
+{
+ bool result = false;
+
+ /*
+ * When limiting maximum backend memory, attempt to refill allocation
+ * request bucket if needed.
+ */
+ if (max_total_bkend_mem && allocation_request > allocation_allowance &&
+ ProcGlobal != NULL)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+ bool sts = false;
+
+ /*
+ * If allocation request is larger than memory refill quantity then
+ * attempt to increase allocation allowance with requested amount,
+ * otherwise fall through. If this refill fails we do not have enough
+ * memory to meet the request.
+ */
+ if (allocation_request >= allocation_allowance_refill_qty)
+ {
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->total_bkend_mem_bytes)) >= allocation_request)
+ {
+ if ((result = pg_atomic_compare_exchange_u64(&procglobal->total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_request)))
+ {
+ allocation_allowance = allocation_allowance + allocation_request;
+ break;
+ }
+ }
+
+ /*
+ * Exclude auxiliary and Postmaster processes from the check.
+ * Return false. While we want to exclude them from the check, we
+ * do not want to exclude them from the above allocation handling.
+ */
+ if (MyAuxProcType != NotAnAuxProcess || MyProcPid == PostmasterPid)
+ return false;
+
+ /*
+ * If the atomic exchange fails (result == false), we do not have
+ * enough reserve memory to meet the request. Negate result to
+ * return the proper value.
+ */
+
+ return !result;
+ }
+
+ /*
+ * Attempt to increase allocation allowance by memory refill quantity.
+ * If available memory is/becomes less than memory refill quantity,
+ * fall through to attempt to allocate remaining available memory.
+ */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->total_bkend_mem_bytes)) >= allocation_allowance_refill_qty)
+ {
+ if ((sts = pg_atomic_compare_exchange_u64(&procglobal->total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem - allocation_allowance_refill_qty)))
+ {
+ allocation_allowance = allocation_allowance + allocation_allowance_refill_qty;
+ break;
+ }
+ }
+
+ /* Do not attempt to increase allocation if available memory is below
+ * allocation_allowance_refill_qty .
+ */
+
+ /*
+ * If refill is not successful, we return true, memory limit exceeded
+ */
+ if (!sts)
+ result = true;
+ }
+
+ /*
+ * Exclude auxiliary and postmaster processes from the check. Return false.
+ * While we want to exclude them from the check, we do not want to exclude
+ * them from the above allocation handling.
+ */
+ if (MyAuxProcType != NotAnAuxProcess || MyProcPid == PostmasterPid)
+ result = false;
+
+ return result;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 35f2d2bffe..4bcdfc91bf 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2123,7 +2123,7 @@ pg_stat_get_memory_allocation(PG_FUNCTION_ARGS)
Datum
pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 2
+#define PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS 3
TupleDesc tupdesc;
Datum values[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
bool nulls[PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS] = {0};
@@ -2133,15 +2133,23 @@ pg_stat_get_global_memory_allocation(PG_FUNCTION_ARGS)
tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_GLOBAL_MEMORY_ALLOCATION_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "datid",
OIDOID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 2, "global_dsm_allocated_bytes",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "total_bkend_mem_bytes_available",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "global_dsm_allocated_bytes",
INT8OID, -1, 0);
BlessTupleDesc(tupdesc);
/* datid */
values[0] = ObjectIdGetDatum(MyDatabaseId);
- /* get global_dsm_allocated_bytes */
- values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
+ /* Get total_bkend_mem_bytes - return -1 if disabled */
+ if (max_total_bkend_mem == 0)
+ values[1] = Int64GetDatum(-1);
+ else
+ values[1] = Int64GetDatum(pg_atomic_read_u64(&procglobal->total_bkend_mem_bytes));
+
+ /* Get global_dsm_allocated_bytes */
+ values[2] = Int64GetDatum(pg_atomic_read_u64(&procglobal->global_dsm_allocation));
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 012d4a0b1f..cd68e5265a 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -104,7 +104,6 @@
#include "utils/dynahash.h"
#include "utils/memutils.h"
-
/*
* Constants
*
@@ -359,7 +358,6 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
Assert(flags & HASH_ELEM);
Assert(info->keysize > 0);
Assert(info->entrysize >= info->keysize);
-
/*
* For shared hash tables, we have a local hash header (HTAB struct) that
* we allocate in TopMemoryContext; all else is in shared memory.
@@ -377,6 +375,7 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
}
else
{
+ /* Set up to allocate the hash header */
/* Create the hash table's private memory context */
if (flags & HASH_CONTEXT)
CurrentDynaHashCxt = info->hcxt;
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 7b8eeb7dbb..a7df801f77 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -172,8 +172,16 @@ InitPostmasterChild(void)
errmsg_internal("could not set postmaster death monitoring pipe to FD_CLOEXEC mode: %m")));
#endif
+ /*
+ * Init pgstat allocated bytes counters here for forked backends.
+ * Fork/exec backends have not yet reattached to shared memory at this
+ * point. They will init pgstat allocated bytes counters in
+ * PGSharedMemoryReAttach.
+ */
+#ifndef EXEC_BACKEND
/* Init allocated bytes to avoid double counting parent allocation */
pgstat_init_allocated_bytes();
+#endif
}
/*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 68aecad66f..eacd1a6043 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3510,6 +3510,17 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_total_backend_memory", PGC_SU_BACKEND, RESOURCES_MEM,
+ gettext_noop("Restrict total backend memory allocations to this max."),
+ gettext_noop("0 turns this feature off."),
+ GUC_UNIT_MB
+ },
+ &max_total_bkend_mem,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b090ec5245..5466234d64 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -161,6 +161,9 @@
#vacuum_buffer_usage_limit = 256kB # size of vacuum and analyze buffer access strategy ring.
# 0 to disable vacuum buffer access strategy
# range 128kB to 16GB
+#max_total_backend_memory = 0MB # Restrict total backend memory allocations
+ # to this max (in MB). 0 turns this feature
+ # off.
# - Disk -
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 4146831d75..a3891a607b 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -432,6 +432,18 @@ AllocSetContextCreateInternal(MemoryContext parent,
else
firstBlockSize = Max(firstBlockSize, initBlockSize);
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(firstBlockSize))
+ {
+ if (TopMemoryContext)
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
+
/*
* Allocate the initial block. Unlike other aset.c blocks, it starts with
* the context header and its block header follows that.
@@ -733,6 +745,11 @@ AllocSetAlloc(MemoryContext context, Size size)
#endif
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) malloc(blksize);
if (block == NULL)
return NULL;
@@ -933,6 +950,10 @@ AllocSetAlloc(MemoryContext context, Size size)
while (blksize < required_size)
blksize <<= 1;
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
/* Try to allocate it */
block = (AllocBlock) malloc(blksize);
@@ -1171,6 +1192,18 @@ AllocSetRealloc(void *pointer, Size size)
blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
oldblksize = block->endptr - ((char *) block);
+ /*
+ * Do not exceed maximum allowed memory allocation. NOTE: checking for
+ * the full size here rather than just the amount of increased
+ * allocation to prevent a potential underflow of *my_allocation
+ * allowance in cases where blksize - oldblksize does not trigger a
+ * refill but blksize is greater than *my_allocation_allowance.
+ * Underflow would occur with the call below to
+ * pgstat_report_allocated_bytes_increase()
+ */
+ if (blksize > oldblksize && exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (AllocBlock) realloc(block, blksize);
if (block == NULL)
{
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 502f877855..b4c9b40766 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -193,6 +193,16 @@ GenerationContextCreate(MemoryContext parent,
else
allocSize = Max(allocSize, initBlockSize);
+ if (exceeds_max_total_bkend_mem(allocSize))
+ {
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
+
/*
* Allocate the initial block. Unlike other generation.c blocks, it
* starts with the context header and its block header follows that.
@@ -372,6 +382,9 @@ GenerationAlloc(MemoryContext context, Size size)
{
Size blksize = required_size + Generation_BLOCKHDRSZ;
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
return NULL;
@@ -475,6 +488,9 @@ GenerationAlloc(MemoryContext context, Size size)
if (blksize < required_size)
blksize = pg_nextpower2_size_t(required_size);
+ if (exceeds_max_total_bkend_mem(blksize))
+ return NULL;
+
block = (GenerationBlock *) malloc(blksize);
if (block == NULL)
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 913787dba8..00a10f3c11 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -356,7 +356,16 @@ SlabContextCreate(MemoryContext parent,
elog(ERROR, "block size %zu for slab is too small for %zu-byte chunks",
blockSize, chunkSize);
-
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(Slab_CONTEXT_HDRSZ(chunksPerBlock)))
+ {
+ MemoryContextStats(TopMemoryContext);
+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory - exceeds max_total_backend_memory"),
+ errdetail("Failed while creating memory context \"%s\".",
+ name)));
+ }
slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
if (slab == NULL)
@@ -559,6 +568,10 @@ SlabAlloc(MemoryContext context, Size size)
}
else
{
+ /* Do not exceed maximum allowed memory allocation */
+ if (exceeds_max_total_bkend_mem(slab->blockSize))
+ return NULL;
+
block = (SlabBlock *) malloc(slab->blockSize);
if (unlikely(block == NULL))
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b095740f8a..4bef1ac428 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5421,9 +5421,9 @@
descr => 'statistics: global memory allocation information',
proname => 'pg_stat_get_global_memory_allocation', proisstrict => 'f',
provolatile => 's', proparallel => 'r', prorettype => 'record',
- proargtypes => '', proallargtypes => '{oid,int8}',
- proargmodes => '{o,o}',
- proargnames => '{datid,global_dsm_allocated_bytes}',
+ proargtypes => '', proallargtypes => '{oid,int8,int8}',
+ proargmodes => '{o,o,o}',
+ proargnames => '{datid,total_bkend_mem_bytes_available,global_dsm_allocated_bytes}',
prosrc =>'pg_stat_get_global_memory_allocation' },
{ oid => '2022',
descr => 'statistics: information about currently active backends',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index e6be67de2a..f62ea132ff 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -406,6 +406,13 @@ typedef struct PROC_HDR
int startupBufferPinWaitBufId;
/* Global dsm allocations */
pg_atomic_uint64 global_dsm_allocation;
+
+ /*
+ * Max backend memory allocation tracker. Used/Initialized when
+ * max_total_bkend_mem > 0 as max_total_bkend_mem (MB) converted to bytes.
+ * Decreases/increases with free/malloc of backend memory.
+ */
+ pg_atomic_uint64 total_bkend_mem_bytes;
} PROC_HDR;
extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 9b75fc5223..068b96dd09 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -15,6 +15,7 @@
#include "libpq/pqcomm.h"
#include "miscadmin.h" /* for BackendType */
#include "storage/backendid.h"
+#include "storage/proc.h"
#include "utils/backend_progress.h"
@@ -305,6 +306,7 @@ typedef struct LocalPgBackendStatus
*/
extern PGDLLIMPORT bool pgstat_track_activities;
extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int max_total_bkend_mem;
/* ----------
@@ -317,6 +319,10 @@ extern PGDLLIMPORT uint64 *my_aset_allocated_bytes;
extern PGDLLIMPORT uint64 *my_dsm_allocated_bytes;
extern PGDLLIMPORT uint64 *my_generation_allocated_bytes;
extern PGDLLIMPORT uint64 *my_slab_allocated_bytes;
+extern PGDLLIMPORT uint64 allocation_allowance;
+extern PGDLLIMPORT uint64 initial_allocation_allowance;
+extern PGDLLIMPORT uint64 allocation_return;
+extern PGDLLIMPORT uint64 allocation_return_threshold;
/* ----------
@@ -364,6 +370,7 @@ extern int pgstat_fetch_stat_numbackends(void);
extern PgBackendStatus *pgstat_fetch_stat_beentry(BackendId beid);
extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
extern char *pgstat_clip_activity(const char *raw_activity);
+extern bool exceeds_max_total_bkend_mem(uint64 allocation_request);
/* ----------
* pgstat_report_allocated_bytes_decrease() -
@@ -379,7 +386,7 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
{
uint64 temp;
- /* Avoid allocated_bytes unsigned integer overflow on decrease */
+ /* Sanity check: my allocated bytes should never drop below zero */
if (pg_sub_u64_overflow(*my_allocated_bytes, proc_allocated_bytes, &temp))
{
/* On overflow, set allocated bytes and allocator type bytes to zero */
@@ -388,13 +395,35 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
*my_dsm_allocated_bytes = 0;
*my_generation_allocated_bytes = 0;
*my_slab_allocated_bytes = 0;
+
+ /* Add freed memory to allocation return counter. */
+ allocation_return += proc_allocated_bytes;
+
+ /*
+ * Return freed memory to the global counter if return threshold is
+ * met.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
+ {
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global tracker */
+ pg_atomic_add_fetch_u64(&procglobal->total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart the count */
+ allocation_return = 0;
+ }
+ }
}
else
{
- /* decrease allocation */
- *my_allocated_bytes -= proc_allocated_bytes;
+ /* Add freed memory to allocation return counter */
+ allocation_return += proc_allocated_bytes;
- /* Decrease allocator type allocated bytes. */
+ /* Decrease allocator type allocated bytes */
switch (pg_allocator_type)
{
case PG_ALLOC_ASET:
@@ -416,6 +445,30 @@ pgstat_report_allocated_bytes_decrease(int64 proc_allocated_bytes,
*my_slab_allocated_bytes -= proc_allocated_bytes;
break;
}
+
+ /* decrease allocation */
+ *my_allocated_bytes = *my_aset_allocated_bytes +
+ *my_dsm_allocated_bytes + *my_generation_allocated_bytes +
+ *my_slab_allocated_bytes;
+
+ /*
+ * Return freed memory to the global counter if return threshold is
+ * met.
+ */
+ if (max_total_bkend_mem && allocation_return >= allocation_return_threshold)
+ {
+ if (ProcGlobal)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ /* Add to global tracker */
+ pg_atomic_add_fetch_u64(&procglobal->total_bkend_mem_bytes,
+ allocation_return);
+
+ /* Restart the count */
+ allocation_return = 0;
+ }
+ }
}
return;
@@ -433,7 +486,13 @@ static inline void
pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
int pg_allocator_type)
{
- *my_allocated_bytes += proc_allocated_bytes;
+ uint64 temp;
+
+ /* Sanity check: my allocated bytes should never drop below zero */
+ if (pg_sub_u64_overflow(allocation_allowance, proc_allocated_bytes, &temp))
+ allocation_allowance = 0;
+ else
+ allocation_allowance -= proc_allocated_bytes;
/* Increase allocator type allocated bytes */
switch (pg_allocator_type)
@@ -458,6 +517,9 @@ pgstat_report_allocated_bytes_increase(int64 proc_allocated_bytes,
break;
}
+ *my_allocated_bytes = *my_aset_allocated_bytes + *my_dsm_allocated_bytes +
+ *my_generation_allocated_bytes + *my_slab_allocated_bytes;
+
return;
}
@@ -477,6 +539,36 @@ pgstat_init_allocated_bytes(void)
*my_generation_allocated_bytes = 0;
*my_slab_allocated_bytes = 0;
+ /* If we're limiting backend memory */
+ if (max_total_bkend_mem)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ uint64 available_max_total_bkend_mem = 0;
+
+ allocation_return = 0;
+ allocation_allowance = 0;
+
+ /* Account for the initial allocation allowance */
+ while ((available_max_total_bkend_mem = pg_atomic_read_u64(&procglobal->total_bkend_mem_bytes)) >= initial_allocation_allowance)
+ {
+ /*
+ * On success populate allocation_allowance. Failure here will
+ * result in the backend's first invocation of
+ * exceeds_max_total_bkend_mem allocating requested, default, or
+ * available memory or result in an out of memory error.
+ */
+ if (pg_atomic_compare_exchange_u64(&procglobal->total_bkend_mem_bytes,
+ &available_max_total_bkend_mem,
+ available_max_total_bkend_mem -
+ initial_allocation_allowance))
+ {
+ allocation_allowance = initial_allocation_allowance;
+
+ break;
+ }
+ }
+ }
+
return;
}
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ff76aa99a2..4f96cf4436 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1883,13 +1883,15 @@ pg_stat_global_memory_allocation| WITH sums AS (
SELECT s.datid,
current_setting('shared_memory_size'::text, true) AS shared_memory_size,
(current_setting('shared_memory_size_in_huge_pages'::text, true))::integer AS shared_memory_size_in_huge_pages,
+ pg_size_bytes(current_setting('max_total_backend_memory'::text, true)) AS max_total_backend_memory_bytes,
+ s.total_bkend_mem_bytes_available,
s.global_dsm_allocated_bytes,
sums.total_aset_allocated_bytes,
sums.total_dsm_allocated_bytes,
sums.total_generation_allocated_bytes,
sums.total_slab_allocated_bytes
FROM sums,
- (pg_stat_get_global_memory_allocation() s(datid, global_dsm_allocated_bytes)
+ (pg_stat_get_global_memory_allocation() s(datid, total_bkend_mem_bytes_available, global_dsm_allocated_bytes)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_gssapi| SELECT pid,
gss_auth AS gss_authenticated,
--
2.25.1
On Mon, 2023-05-22 at 08:42 -0400, reid.thompson@crunchydata.com wrote:
More followup to the above.
I experimented on my system regarding
"The simple query select * from generate_series(0, 10000000) shows roughly 18.9 % degradation on my test server."My laptop:
32GB ram
11th Gen Intel(R) Core(TM) i7-11850H 8 cores/16 threads @ 2.50GHz (Max Turbo Frequency. 4.80 GHz ; Cache. 24 MB)
SSD -> Model: KXG60ZNV1T02 NVMe KIOXIA 1024GB (nvme)
Hi
Ran through a few more tests on my system varying the
initial_allocation_allowance and allocation_allowance_refill_qty from the
current 1MB to 2, 4, 6, 8, 10 mb. Also realized that in my last tests/email I
had posted percent difference rather than percent change. Turns out for the
numbers that were being compared they're essentially the same, but I'm
providing both for this set of tests. Ten runs for each comparison. Compared
dev-max-memory set, dev-max-memory unset, master, and pg-stat-activity-backend-memory-allocated
against master at each allocation value;
Again, the test invokes
psql -At -d postgres $connstr -P pager=off -c 'select * from generate_series(0, 10000000)'
100 times on each of the 2 instances and calculates the AVG time and SD
for the 100 runs. It then uses the AVG from each instance to calculate
the percentage difference/change.
These tests contain one code change not yet pushed to pgsql-hackers. In
AllocSetReset() do not enter pgstat_report_allocated_bytes_decrease if no
memory has been freed.
Will format and post some pgbench test result in a separate email.
Percent difference:
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: difference-dev-max-memory-set VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 4.2263% difference 3.03961% difference 0.0585808% difference 2.92451% difference 3.34694% difference 2.67771% difference
3 │ 3.55709% difference 3.92339% difference 2.29144% difference 3.2156% difference 2.06153% difference 2.86217% difference
4 │ 2.04389% difference 2.91866% difference 3.73463% difference 2.86161% difference 3.60992% difference 3.07293% difference
5 │ 3.1306% difference 3.64773% difference 2.38063% difference 1.84845% difference 4.87375% difference 4.16953% difference
6 │ 3.12556% difference 3.34537% difference 2.99052% difference 2.60538% difference 2.14825% difference 1.95454% difference
7 │ 2.20615% difference 2.12861% difference 2.85282% difference 2.43336% difference 2.31389% difference 3.21563% difference
8 │ 1.9954% difference 3.61371% difference 3.35543% difference 3.49821% difference 3.41526% difference 8.25753% difference
9 │ 2.46845% difference 2.57784% difference 3.13067% difference 3.67681% difference 2.89139% difference 3.6067% difference
10 │ 3.60092% difference 2.16164% difference 3.9976% difference 2.6144% difference 4.27892% difference 2.68998% difference
11 │ 2.55454% difference 2.39073% difference 3.09631% difference 3.24292% difference 1.9107% difference 1.76182% difference
12 │
13 │ 28.9089/10 29.74729/10 27.888631/10 28.92125/10 30.85055/10 34.26854/10
14 │ 2.89089 2.974729 2.7888631 2.892125 3.085055 3.426854
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: difference-dev-max-memory-unset VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 3.96616% difference 3.05528% difference 0.563267% difference 1.12075% difference 3.52398% difference 3.25641% difference
3 │ 3.11387% difference 3.12499% difference 1.1133% difference 4.86997% difference 2.11481% difference 1.11668% difference
4 │ 3.14506% difference 2.06193% difference 3.36034% difference 2.80644% difference 2.37822% difference 3.07669% difference
5 │ 2.81052% difference 3.18499% difference 2.70705% difference 2.27847% difference 2.78506% difference 3.02919% difference
6 │ 2.9765% difference 3.44165% difference 2.62039% difference 4.61596% difference 2.27937% difference 3.89676% difference
7 │ 3.201% difference 1.35838% difference 2.40578% difference 3.95695% difference 2.25983% difference 4.17585% difference
8 │ 5.35191% difference 3.96434% difference 4.32891% difference 3.62715% difference 2.17503% difference 0.620856% difference
9 │ 3.44241% difference 2.9754% difference 3.03765% difference 1.48104% difference 1.53958% difference 3.14598% difference
10 │ 10.1155% difference 4.21062% difference 1.64416% difference 1.51458% difference 2.92131% difference 2.95603% difference
11 │ 3.11011% difference 4.31318% difference 2.01991% difference 4.71192% difference 2.37039% difference 4.25241% difference
12 │
13 │ 41.23304/10 31.69076/10 23.800757/10 30.98323/10 24.34758/10 29.526856/10
14 │ 4.123304 3.169076 2.3800757 3.098323 2.434758 2.9526856
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: difference-master VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 0.0734782% difference 0.0955457% difference 0.0521627% difference 2.32643% difference 0.286493% difference 1.26977% difference
3 │ 0.547862% difference 1.19087% difference 0.276915% difference 0.334332% difference 0.260545% difference 0.108956% difference
4 │ 0.0714666% difference 0.931605% difference 0.753996% difference 0.457174% difference 0.215904% difference 1.43979% difference
5 │ 0.269737% difference 0.848613% difference 0.222909% difference 0.315927% difference 0.290408% difference 0.248591% difference
6 │ 1.04231% difference 0.367444% difference 0.699571% difference 0.29266% difference 0.844548% difference 0.273776% difference
7 │ 0.0584984% difference 0.15094% difference 0.0721539% difference 0.594991% difference 1.80223% difference 0.500557% difference
8 │ 0.355129% difference 1.19517% difference 0.201835% difference 1.2351% difference 0.266004% difference 0.80893% difference
9 │ 0.0811794% difference 1.16184% difference 1.01913% difference 0.149087% difference 0.402931% difference 0.125788% difference
10 │ 0.950973% difference 0.154471% difference 0.42623% difference 0.874816% difference 0.157934% difference 0.225433% difference
11 │ 0.501783% difference 0.308357% difference 0.279147% difference 0.122458% difference 0.538141% difference 0.865846% difference
12 │
13 │ 3.952417/10 6.404856/10 4.00405/10 6.702975/10 5.065138/10 5.867437/10
14 │ 0.3952417 0.6404856 0.400405 0.6702975 0.5065138 0.5867437
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: difference-pg-stat-activity-backend-memory-allocated VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 2.04788% difference 0.50705% difference 0.504772% difference 0.136316% difference 0.590087% difference 1.33931% difference
3 │ 1.21173% difference 0.3309% difference 0.482685% difference 1.67956% difference 0.175478% difference 0.969286% difference
4 │ 0.0680972% difference 0.295211% difference 0.867547% difference 1.12959% difference 0.193756% difference 0.714178% difference
5 │ 0.91525% difference 1.42408% difference 1.49059% difference 0.641652% difference 1.34265% difference 0.378394% difference
6 │ 2.46448% difference 2.67081% difference 0.63824% difference 0.650301% difference 0.481858% difference 1.65711% difference
7 │ 1.31021% difference 0.0548831% difference 1.23217% difference 2.11691% difference 0.31629% difference 3.85858% difference
8 │ 1.61458% difference 0.46042% difference 0.724742% difference 0.172952% difference 1.33157% difference 0.556898% difference
9 │ 1.65063% difference 0.59815% difference 1.42473% difference 0.725576% difference 0.229639% difference 0.875489% difference
10 │ 1.78567% difference 1.45652% difference 0.6317% difference 1.99146% difference 0.999521% difference 1.85291% difference
11 │ 0.391318% difference 1.13216% difference 0.138291% difference 0.531084% difference 0.680197% difference 1.63162% difference
12 │
13 │ 13.459845/10 8.930184/10 8.135467/10 9.775401/10 6.341046/10 13.83377/10
14 │ 1.3459845 0.8930184 0.8135467 0.9775401 0.6341046 1.3833775
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Percent change:
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: change-dev-max-memory-set VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 4.13884% change 2.99411% change 0.0585636% change 2.88237% change 3.29185% change 2.64233% change
3 │ 3.49493% change 3.84791% change 2.26549% change 3.16472% change 2.0405% change 2.82179% change
4 │ 2.02322% change 2.87668% change 3.66617% change 2.82124% change 3.54592% change 3.02643% change
5 │ 3.08235% change 3.5824% change 2.35263% change 1.83153% change 4.75781% change 4.08438% change
6 │ 3.07746% change 3.29033% change 2.94646% change 2.57188% change 2.12542% change 1.93562% change
7 │ 2.18208% change 2.10619% change 2.8127% change 2.40411% change 2.28743% change 3.16474% change
8 │ 1.97569% change 3.54957% change 3.30007% change 3.43808% change 3.35792% change 7.93011% change
9 │ 2.43836% change 2.54504% change 3.08242% change 3.61044% change 2.85019% change 3.54281% change
10 │ 3.53724% change 2.13852% change 3.91926% change 2.58067% change 4.18929% change 2.65428% change
11 │ 2.52233% change 2.36249% change 3.0491% change 3.19118% change 1.89262% change 1.74644% change
12 │
13 │ 28.4725/10 29.29324/10 27.452864/10 28.49622/10 30.33895/10 33.54893/10
14 │ 2.84725 2.929324 2.7452864 2.849622 3.033895 3.354893
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: change-dev-max-memory-unset VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 3.88903% change 3.00931% change 0.564858% change 1.11451% change 3.46296% change 3.20424% change
3 │ 3.06613% change 3.07691% change 1.10714% change 4.75421% change 2.09268% change 1.11048% change
4 │ 3.09637% change 2.04089% change 3.30482% change 2.7676% change 2.35028% change 3.03008% change
5 │ 2.77157% change 3.13506% change 2.6709% change 2.2528% change 2.74681% change 2.984% change
6 │ 2.93285% change 3.38343% change 2.5865% change 4.51183% change 2.25368% change 3.82229% change
7 │ 3.15057% change 1.34921% change 2.37719% change 3.88018% change 2.23458% change 4.09044% change
8 │ 5.21243% change 3.88728% change 4.23719% change 3.56254% change 2.15163% change 0.62279% change
9 │ 3.38416% change 2.93178% change 2.99221% change 1.47015% change 1.52782% change 3.09726% change
10 │ 10.6543% change 4.1238% change 1.63075% change 1.5032% change 2.87926% change 2.91298% change
11 │ 3.06248% change 4.22213% change 1.99972% change 4.60347% change 2.34263% change 4.16388% change
12 │
13 │ 41.21989/10 31.1598/10 23.471278/10 30.42049/10 24.04233/10 29.03844/10
14 │ 4.121989 3.11598 2.3471278 3.042049 2.404233 2.903844
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: change-master VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 0.0734512% change 0.0955% change 0.0521763% change 2.35381% change 0.286904% change 1.27789% change
3 │ 0.549367% change 1.18382% change 0.276532% change 0.333774% change 0.260206% change 0.108897% change
4 │ 0.0714411% change 0.927286% change 0.751164% change 0.456132% change 0.216137% change 1.4295% change
5 │ 0.269374% change 0.845028% change 0.222661% change 0.315429% change 0.29083% change 0.2489% change
6 │ 1.0369% change 0.368121% change 0.702026% change 0.292232% change 0.840997% change 0.273402% change
7 │ 0.0584813% change 0.151054% change 0.07218% change 0.596766% change 1.78613% change 0.499307% change
8 │ 0.355761% change 1.18807% change 0.201631% change 1.22752% change 0.265651% change 0.805671% change
9 │ 0.0812124% change 1.16863% change 1.02435% change 0.149198% change 0.402121% change 0.125709% change
10 │ 0.955516% change 0.154351% change 0.425324% change 0.871006% change 0.158059% change 0.225179% change
11 │ 0.500527% change 0.307882% change 0.278758% change 0.122533% change 0.539593% change 0.862113% change
12 │
13 │ 3.952031/10 6.389742/10 4.006802/10 6.7184/10 5.046628/10 5.856568/10
14 │ 0.3952031 0.6389742 0.4006802 0.67184 0.5046628 0.5856568
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Results: change-pg-stat-activity-backend-memory-allocated VS master
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1MB allocation 2MB allocation 4MB allocation 6MB allocation 8MB allocation 10MB allocation
2 │ 2.02713% change 0.505768% change 0.506049% change 0.136223% change 0.591833% change 1.3304% change
3 │ 1.20444% change 0.331448% change 0.481523% change 1.66557% change 0.175325% change 0.974006% change
4 │ 0.068074% change 0.294776% change 0.8638% change 1.12325% change 0.193568% change 0.711637% change
5 │ 0.91108% change 1.41401% change 1.47956% change 0.6396% change 1.33369% change 0.377679% change
6 │ 2.43448% change 2.63562% change 0.636209% change 0.648194% change 0.4807% change 1.64349% change
7 │ 1.30168% change 0.054868% change 1.22463% change 2.09474% change 0.316791% change 3.93449% change
8 │ 1.60165% change 0.461483% change 0.722126% change 0.173102% change 1.32277% change 0.555352% change
9 │ 1.63712% change 0.599944% change 1.41466% change 0.722953% change 0.229375% change 0.871673% change
10 │ 1.76986% change 1.44599% change 0.629711% change 1.97183% change 0.99455% change 1.8359% change
11 │ 0.392085% change 1.12579% change 0.138195% change 0.532498% change 0.677892% change 1.61841% change
12 │
13 │ 13.347599/10 8.869697/10 8.096463/10 9.70796/10 6.316494/10 13.853037/10
14 │ 1.3347599 0.8869697 0.8096463 0.970796 0.6316494 1.385303
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote:
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent to patch 1.
+1 to the idea, have doubts on the implementation.
I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?
Also, the patch needs to be rebased.
--
regards,
Andrey Lepikhov
Postgres Professional
On 29/9/2023 09:52, Andrei Lepikhov wrote:
On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote:
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent
to patch 1.+1 to the idea, have doubts on the implementation.
I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?
By the patch in attachment I try to show which sort of problems I'm
worrying about. In some PП_CATCH() sections we do CopyErrorData
(allocate some memory) before aborting the transaction. So, the
allocation error can move us out of this section before aborting. We
await for soft ERROR message but will face more hard consequences.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
reorder_operators.difftext/plain; charset=UTF-8; name=reorder_operators.diffDownload
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..3f992b8d92 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -291,10 +291,7 @@ _SPI_commit(bool chain)
{
ErrorData *edata;
- /* Save error info in caller's context */
MemoryContextSwitchTo(oldcontext);
- edata = CopyErrorData();
- FlushErrorState();
/*
* Abort the failed transaction. If this fails too, we'll just
@@ -302,6 +299,10 @@ _SPI_commit(bool chain)
*/
AbortCurrentTransaction();
+ /* Save error info in caller's context */
+ edata = CopyErrorData();
+ FlushErrorState();
+
/* ... and start a new one */
StartTransactionCommand();
if (chain)
@@ -383,10 +384,7 @@ _SPI_rollback(bool chain)
{
ErrorData *edata;
- /* Save error info in caller's context */
MemoryContextSwitchTo(oldcontext);
- edata = CopyErrorData();
- FlushErrorState();
/*
* Try again to abort the failed transaction. If this fails too,
@@ -395,6 +393,10 @@ _SPI_rollback(bool chain)
*/
AbortCurrentTransaction();
+ /* Save error info in caller's context */
+ edata = CopyErrorData();
+ FlushErrorState();
+
/* ... and start a new one */
StartTransactionCommand();
if (chain)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 12edc5772a..f9cf599026 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2565,7 +2565,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
PG_CATCH();
{
MemoryContext ecxt = MemoryContextSwitchTo(ccxt);
- ErrorData *errdata = CopyErrorData();
+ ErrorData *errdata;
/* TODO: Encapsulate cleanup from the PG_TRY and PG_CATCH blocks */
if (iterstate)
@@ -2579,6 +2579,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
*/
AbortCurrentTransaction();
+ errdata = CopyErrorData();
+
/* make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(txn->ninvalidations,
txn->invalidations);
Greetings,
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
On 29/9/2023 09:52, Andrei Lepikhov wrote:
On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote:
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent
to patch 1.+1 to the idea, have doubts on the implementation.
I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?
By the patch in attachment I try to show which sort of problems I'm worrying
about. In some PП_CATCH() sections we do CopyErrorData (allocate some
memory) before aborting the transaction. So, the allocation error can move
us out of this section before aborting. We await for soft ERROR message but
will face more hard consequences.
While it's an interesting idea to consider making exceptions to the
limit, and perhaps we'll do that (or have some kind of 'reserve' for
such cases), this isn't really any different than today, is it? We
might have a malloc() failure in the main path, end up in PG_CATCH() and
then try to do a CopyErrorData() and have another malloc() failure.
If we can rearrange the code to make this less likely to happen, by
doing a bit more work to free() resources used in the main path before
trying to do new allocations, then, sure, let's go ahead and do that,
but that's independent from this effort.
Thanks!
Stephen
On 19/10/2023 02:00, Stephen Frost wrote:
Greetings,
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
On 29/9/2023 09:52, Andrei Lepikhov wrote:
On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote:
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent
to patch 1.+1 to the idea, have doubts on the implementation.
I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?By the patch in attachment I try to show which sort of problems I'm worrying
about. In some PП_CATCH() sections we do CopyErrorData (allocate some
memory) before aborting the transaction. So, the allocation error can move
us out of this section before aborting. We await for soft ERROR message but
will face more hard consequences.While it's an interesting idea to consider making exceptions to the
limit, and perhaps we'll do that (or have some kind of 'reserve' for
such cases), this isn't really any different than today, is it? We
might have a malloc() failure in the main path, end up in PG_CATCH() and
then try to do a CopyErrorData() and have another malloc() failure.If we can rearrange the code to make this less likely to happen, by
doing a bit more work to free() resources used in the main path before
trying to do new allocations, then, sure, let's go ahead and do that,
but that's independent from this effort.
I agree that rearranging efforts can be made independently. The code in
the letter above was shown just as a demo of the case I'm worried about.
IMO, the thing that should be implemented here is a recursion level for
the memory limit. If processing the error, we fall into recursion with
this limit - we should ignore it.
I imagine custom extensions that use PG_CATCH() and allocate some data
there. At least we can raise the level of error to FATAL.
--
regards,
Andrey Lepikhov
Postgres Professional
Greetings,
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
On 19/10/2023 02:00, Stephen Frost wrote:
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
On 29/9/2023 09:52, Andrei Lepikhov wrote:
On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote:
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent
to patch 1.+1 to the idea, have doubts on the implementation.
I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?By the patch in attachment I try to show which sort of problems I'm worrying
about. In some PП_CATCH() sections we do CopyErrorData (allocate some
memory) before aborting the transaction. So, the allocation error can move
us out of this section before aborting. We await for soft ERROR message but
will face more hard consequences.While it's an interesting idea to consider making exceptions to the
limit, and perhaps we'll do that (or have some kind of 'reserve' for
such cases), this isn't really any different than today, is it? We
might have a malloc() failure in the main path, end up in PG_CATCH() and
then try to do a CopyErrorData() and have another malloc() failure.If we can rearrange the code to make this less likely to happen, by
doing a bit more work to free() resources used in the main path before
trying to do new allocations, then, sure, let's go ahead and do that,
but that's independent from this effort.I agree that rearranging efforts can be made independently. The code in the
letter above was shown just as a demo of the case I'm worried about.
IMO, the thing that should be implemented here is a recursion level for the
memory limit. If processing the error, we fall into recursion with this
limit - we should ignore it.
I imagine custom extensions that use PG_CATCH() and allocate some data
there. At least we can raise the level of error to FATAL.
Ignoring such would defeat much of the point of this effort- which is to
get to a position where we can say with some confidence that we're not
going to go over some limit that the user has set and therefore not
allow ourselves to end up getting OOM killed. These are all the same
issues that already exist today on systems which don't allow overcommit
too, there isn't anything new here in regards to these risks, so I'm not
really keen to complicate this to deal with issues that are already
there.
Perhaps once we've got the basics in place then we could consider
reserving some space for handling such cases.. but I don't think it'll
actually be very clean and what if we have an allocation that goes
beyond what that reserved space is anyway? Then we're in the same spot
again where we have the choice of either failing the allocation in a
less elegant way than we might like to handle that error, or risk
getting outright kill'd by the kernel. Of those choices, sure seems
like failing the allocation is the better way to go.
Thanks,
Stephen
Hi,
On 2023-10-19 18:06:10 -0400, Stephen Frost wrote:
Ignoring such would defeat much of the point of this effort- which is to
get to a position where we can say with some confidence that we're not
going to go over some limit that the user has set and therefore not
allow ourselves to end up getting OOM killed.
I think that is a good medium to long term goal. I do however think that we'd
be better off merging the visibility of memory allocations soon-ish and
implement the limiting later. There's a lot of hairy details to get right for
the latter, and even just having visibility will be a huge improvement.
I think even patch 1 is doing too much at once. I doubt the DSM stuff is
quite right.
I'm unconvinced it's a good idea to split the different types of memory
contexts out. That just exposes too much implementation detail stuff without a
good reason.
I think the overhead even just the tracking implies right now is likely too
high and needs to be optimized. It should be a single math operation, not
tracking things in multiple fields. I don't think pg_sub_u64_overflow() should
be in the path either, that suddenly adds conditional branches. You really
ought to look at the difference in assembly for the hot functions.
Greetings,
Andres Freund
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2023-10-19 18:06:10 -0400, Stephen Frost wrote:
Ignoring such would defeat much of the point of this effort- which is to
get to a position where we can say with some confidence that we're not
going to go over some limit that the user has set and therefore not
allow ourselves to end up getting OOM killed.I think that is a good medium to long term goal. I do however think that we'd
be better off merging the visibility of memory allocations soon-ish and
implement the limiting later. There's a lot of hairy details to get right for
the latter, and even just having visibility will be a huge improvement.
I agree that having the visibility will be a great improvement and
perhaps could go in separately, but I don't know that I agree that the
limits are going to be that much of an issue. In any case, there's been
work ongoing on this and that'll be posted soon. I was just trying to
address the general comment raised in this sub-thread here.
I think even patch 1 is doing too much at once. I doubt the DSM stuff is
quite right.
Getting DSM right has certainly been tricky, along with other things,
but we've been working towards, and continue to work towards, getting
everything to line up nicely between memory context allocations of
various types and the amounts which are being seen as malloc'd/free'd.
There's been parts of this also reworked to allow us to see per-backend
reservations as well as total reserved and to get those numbers able to
be matched up inside of a given transaction using the statistics system.
I'm unconvinced it's a good idea to split the different types of memory
contexts out. That just exposes too much implementation detail stuff without a
good reason.
DSM needs to be independent anyway ... as for the others, perhaps we
could combine them, though that's pretty easily done later and for now
it's been useful to see them split out as we've been working on the
patch.
I think the overhead even just the tracking implies right now is likely too
high and needs to be optimized. It should be a single math operation, not
tracking things in multiple fields. I don't think pg_sub_u64_overflow() should
be in the path either, that suddenly adds conditional branches. You really
ought to look at the difference in assembly for the hot functions.
This has been improved in the most recent work and we'll have that
posted soon, probably best to hold off from larger review of this right
now- as mentioned, I was just trying to address the specific question in
this sub-thread since a new patch is coming soon.
Thanks,
Stephen
On 20/10/2023 05:06, Stephen Frost wrote:
Greetings,
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
On 19/10/2023 02:00, Stephen Frost wrote:
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
On 29/9/2023 09:52, Andrei Lepikhov wrote:
On 22/5/2023 22:59, reid.thompson@crunchydata.com wrote:
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent
to patch 1.+1 to the idea, have doubts on the implementation.
I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?By the patch in attachment I try to show which sort of problems I'm worrying
about. In some PП_CATCH() sections we do CopyErrorData (allocate some
memory) before aborting the transaction. So, the allocation error can move
us out of this section before aborting. We await for soft ERROR message but
will face more hard consequences.While it's an interesting idea to consider making exceptions to the
limit, and perhaps we'll do that (or have some kind of 'reserve' for
such cases), this isn't really any different than today, is it? We
might have a malloc() failure in the main path, end up in PG_CATCH() and
then try to do a CopyErrorData() and have another malloc() failure.If we can rearrange the code to make this less likely to happen, by
doing a bit more work to free() resources used in the main path before
trying to do new allocations, then, sure, let's go ahead and do that,
but that's independent from this effort.I agree that rearranging efforts can be made independently. The code in the
letter above was shown just as a demo of the case I'm worried about.
IMO, the thing that should be implemented here is a recursion level for the
memory limit. If processing the error, we fall into recursion with this
limit - we should ignore it.
I imagine custom extensions that use PG_CATCH() and allocate some data
there. At least we can raise the level of error to FATAL.Ignoring such would defeat much of the point of this effort- which is to
get to a position where we can say with some confidence that we're not
going to go over some limit that the user has set and therefore not
allow ourselves to end up getting OOM killed. These are all the same
issues that already exist today on systems which don't allow overcommit
too, there isn't anything new here in regards to these risks, so I'm not
really keen to complicate this to deal with issues that are already
there.Perhaps once we've got the basics in place then we could consider
reserving some space for handling such cases.. but I don't think it'll
actually be very clean and what if we have an allocation that goes
beyond what that reserved space is anyway? Then we're in the same spot
again where we have the choice of either failing the allocation in a
less elegant way than we might like to handle that error, or risk
getting outright kill'd by the kernel. Of those choices, sure seems
like failing the allocation is the better way to go.
I've got your point.
The only issue I worry about is the uncertainty and clutter that can be
created by this feature. In the worst case, when we have a complex error
stack (including the extension's CATCH sections, exceptions in stored
procedures, etc.), the backend will throw the memory limit error
repeatedly. Of course, one failed backend looks better than a
surprisingly killed postmaster, but the mix of different error reports
and details looks terrible and challenging to debug in the case of
trouble. So, may we throw a FATAL error if we reach this limit while
handling an exception?
--
regards,
Andrey Lepikhov
Postgres Professional
Greetings,
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
The only issue I worry about is the uncertainty and clutter that can be
created by this feature. In the worst case, when we have a complex error
stack (including the extension's CATCH sections, exceptions in stored
procedures, etc.), the backend will throw the memory limit error repeatedly.
I'm not seeing what additional uncertainty or clutter there is- this is,
again, exactly the same as what happens today on a system with
overcommit disabled and I don't feel like we get a lot of complaints
about this today.
Of course, one failed backend looks better than a surprisingly killed
postmaster, but the mix of different error reports and details looks
terrible and challenging to debug in the case of trouble. So, may we throw a
FATAL error if we reach this limit while handling an exception?
I don't see why we'd do that when we can do better- we just fail
whatever the ongoing query or transaction is and allow further requests
on the same connection. We already support exactly that and it works
really rather well and I don't see why we'd throw that away because
there's a different way to get an OOM error.
If you want to make the argument that we should throw FATAL on OOM when
handling an exception, that's something you could argue independently of
this effort already today, but I don't think you'll get agreement that
it's an improvement.
Thanks,
Stephen
On 20/10/2023 19:39, Stephen Frost wrote:
Greetings,
* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
The only issue I worry about is the uncertainty and clutter that can be
created by this feature. In the worst case, when we have a complex error
stack (including the extension's CATCH sections, exceptions in stored
procedures, etc.), the backend will throw the memory limit error repeatedly.I'm not seeing what additional uncertainty or clutter there is- this is,
again, exactly the same as what happens today on a system with
overcommit disabled and I don't feel like we get a lot of complaints
about this today.
Maybe I missed something or see this feature from an alternate point of
view (as an extension developer), but overcommit is more useful so far:
it kills a process.
It means that after restart, the backend/background worker will have an
initial internal state. With this limit enabled, we need to remember
that each function call can cause an error, and we have to remember it
using static PG_CATCH sections where we must rearrange local variables
to the initial (?) state. So, it complicates development.
Of course, this limit is a good feature, but from my point of view, it
would be better to kill a memory-consuming backend instead of throwing
an error. At least for now, we don't have a technique to repeat query
planning with chances to build a more effective plan.
--
regards,
Andrei Lepikhov
Postgres Professional
Hi,
On 2023-10-24 09:39:42 +0700, Andrei Lepikhov wrote:
On 20/10/2023 19:39, Stephen Frost wrote:
Greetings,* Andrei Lepikhov (a.lepikhov@postgrespro.ru) wrote:
The only issue I worry about is the uncertainty and clutter that can be
created by this feature. In the worst case, when we have a complex error
stack (including the extension's CATCH sections, exceptions in stored
procedures, etc.), the backend will throw the memory limit error repeatedly.I'm not seeing what additional uncertainty or clutter there is- this is,
again, exactly the same as what happens today on a system with
overcommit disabled and I don't feel like we get a lot of complaints
about this today.Maybe I missed something or see this feature from an alternate point of view
(as an extension developer), but overcommit is more useful so far: it kills
a process.
In case of postgres it doesn't just kill one postgres, it leads to *all*
connections being terminated.
It means that after restart, the backend/background worker will have an
initial internal state. With this limit enabled, we need to remember that
each function call can cause an error, and we have to remember it using
static PG_CATCH sections where we must rearrange local variables to the
initial (?) state. So, it complicates development.
You need to be aware of errors being thrown regardless this feature, as
out-of-memory errors can be encountered today already. There also are many
other kinds of errors that can be thrown.
Greetings,
Andres Freund
Hello!
Earlier in this thread, the pgbench results were published, where with a strong memory limit of 100MB
a significant, about 10%, decrease in TPS was observed [1]/messages/by-id/3178e9a1b7acbcf023fafed68ca48d76afc07907.camel@crunchydata.com.
Using dedicated server with 12GB RAM and methodology described in [3]/messages/by-id/1d3a7d8f-cb7c-4468-a578-d8a1194ea2de@postgrespro.ru, i performed five series
of measurements for the patches from the [2]/messages/by-id/4edafedc0f8acb12a2979088ac1317bd7dd42145.camel@crunchydata.com.
The series were like this:
1) unpatched 16th version at the REL_16_BETA1 (e0b82fc8e83) as close to [2]/messages/by-id/4edafedc0f8acb12a2979088ac1317bd7dd42145.camel@crunchydata.com in time.
2) patched REL_16_BETA1 at e0b82fc8e83 with undefined max_total_backend_memory GUC (with default value = 0).
3) patched REL_16_BETA1 with max_total_backend_memory = 16GB
4) the same with max_total_backend_memory = 8GB
5) and again with max_total_backend_memory = 200MB
Measurements with max_total_backend_memory = 100MB were not be carried out,
with limit 100MB the server gave an error on startup:
FATAL: configured max_total_backend_memory 100MB is <= shared_memory_size 143MB
So i used 200MB to retain all other GUCs the same.
Pgbench gave the following results:
1) and 2) almost the same: ~6350 TPS. See orange and green
distributions on the attached graph.png respectively.
3) and 4) identical to each other (~6315 TPS) and a bit slower than 1) and 2) by ~0,6%.
See blue and yellow distributions respectively.
5) is slightly slower (~6285 TPS) than 3) and 4) by another 0,5%. (grey distribution)
The standard error in all series was ~0.2%. There is a raw data in the raw_data.txt.
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
[1]: /messages/by-id/3178e9a1b7acbcf023fafed68ca48d76afc07907.camel@crunchydata.com
[2]: /messages/by-id/4edafedc0f8acb12a2979088ac1317bd7dd42145.camel@crunchydata.com
[3]: /messages/by-id/1d3a7d8f-cb7c-4468-a578-d8a1194ea2de@postgrespro.ru
Attachments:
graph.pngimage/png; name=graph.pngDownload
�PNG
IHDR � ���Z pHYs g��R IDATx^��y\Te�?��0 �.BJ��+���+���i���i�}��o��ugi��eee���i��w�������� ��
� ���������330,���x�xp������s����R����@!�B!�B!�B1�m��
�!�B!�B!�B�����6m�H�B!�B!�B!�B�qqq a=!�B!�B!�B!�
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b!
�B!�B!�B!�b![q!�B!��6�N����lkk�B!��B!�B�\#� ������<A�VC�VC�����666ptt���+|}}���%��I���(++W[����B���������Q#������G||<����M�����'O�k��h�����Z#%%��#j�Z�Z����������#7n___��Zv����RZZ�������O\�\�v
<x����n�� www4n�X����6��Qe�5�j�*&)) �~�)[~��w��{w����q>D�N�`oo/n��Z-��������� qs�*--�����@�����
6D��- �R)��E���p��Edee���nnn���A�.]���"�n�V�����������l���������h��������"��zZ����pqq���������GFFk�4
k������� '''4i��7��}[*!!999 �6m����U�C.;;�/_Frr2
akkwww�j�
AAA����B�����7o"''���(..���3<==��m[4o�����(Wc����R���ukqS�Y���V�q��u���#//>D�z�����f���]�v�����CNN�������jQRR�N�������~~~h�����Uv��%���@�P 88��� ??���CFF������///t���������TRR���/�k�Z����+�����U+�m����(*�
�.]BRRrss NNNh��1��i�-Z�A!���7���[�j�����F���#44#F�0�� �}�4���bs��A���������i�����mu�Z����K���������-��h�e�>|eee8v�,XP��w���g�}&�6�����u���c�}�������]�V�Te�6mW1[�lAjj*BCC� KUEDD`��x���������8�FU/j�������Fm�� {�� �����/�D�z���jWii)���K��� ����7�xC��v�>}��'����M���+�y��966��2����U�V�����&