debug_palloc_context_threshold - dump code paths leading to memory leaks
Hi -hackers,
From time to time we hit some memory leaks in PostgreSQL and on one
occasion Tomas wrote: "I certainly agree it's annoying that when OOM
hits, we end up with no information about what used the memory. Maybe
we could have a threshold triggering a call to MemoryContextStats?" -
see [1]/messages/by-id/9574e4b6-3334-777b-4287-29d81151963a@enterprisedb.com. I had this in my memory/TODO list for quite some time now,
and attached is simplistic patch to provide new
"debug_palloc_context_threshold" GUC with aim to help investigating
such memory leaks in contexts (think: mostly on customer side of
things) and start discussion maybe if there are better alternatives.
The aim is to see which codepath contributes to the case of using e.g.
more than 5GB in a single context (it will start then dumping
stats/stack traces every 100ms). Here the patch assumes that most
memory leaks are present in the same context, so it dumps memory into
a log, which can be later analyzed to build a stack trace profile.
This patch is just to get discussion as I found that:
- pg_backend_memory_contexts view: we often cannot modify an
application to issue it from time to time.
- Scripting gdb to call MemoryContextStats() or using 14+
pg_log_backend_memory_contexts() often seems impossible too as we do
not know the PID which is going to cause it (the usual pattern is like
this: some app starts some SQL, sometimes OOMs, rinse and repeat)
My doubts are following:
1. Is dumping on per-context threshold enough? That means it is not
per-backend-RSS threshold. My problem is that palloc() would have to
recursively collect context->mem_allocated for every context out there
starting from TopMemoryContext on every run, right? (or just some
global form of memory accounting would have to be introduced to avoid
such slowdown of palloc(). What's worse there's seems to be no
standardized OS-agnostic way of getting RSS in C/POSIX)
2. Is such stack trace reporting - see example in [2]sample use and log: - enough? Right
now it is multi-line, but maybe we could have our own version of
certain routines so that it would report all in one line? Like, e.g.:
2024-08-19 09:11:04.664 CEST [9317] LOG: dumping memory context stats
for context "Caller tuples": Grand total: 25165712 bytes in 13 blocks;
1705672 free (0 chunks); 23460040 used; errbacktrace+0x46 <- +0x66c7e5
<- palloc+0x4a <- datumCopy+0x30 <- tuplesort_putdatum+0x8f <-
+0x37bf26 <- [..long list..]
The advantage here would be that - with that sampling every XXX ms, we
could extract and statistically investigate what's the code path
requesting most of the memory , even by using simple grep(1) / sort(1)
/ uniq(1)
3. In offlist discussion Tomas, pointed out to me that "it's not very
clear the place where we exceed the limit is necessarily the place
where we leak memory" (that palloc(), might be later pfree'ed())
4. Another idea (by Tomas) is related to the static threshold. Maybe
we want it to be dynamic, so add e.g. +1 .. 50% on every such call, so
to catch only growing mem usage, not just >
debug_palloc_context_threshold.
Anything else? What other experiences may be relevant here?
-J.
[1]: /messages/by-id/9574e4b6-3334-777b-4287-29d81151963a@enterprisedb.com
/messages/by-id/9574e4b6-3334-777b-4287-29d81151963a@enterprisedb.com
[2]: sample use and log:
create table zz as select 'AAAAAAAAAAA' || (i % 100) as id from
generate_series(1000000, 3000000) i;
set work_mem to '1GB';
set max_parallel_workers_per_gather to 0;
set enable_hashagg to off;
set debug_palloc_context_threshold to '1 MB';
select id, count(*) from zz group by id having count(*) > 1;
2024-08-19 09:11:04.664 CEST [9317] LOG: dumping memory context stats
2024-08-19 09:11:04.664 CEST [9317] DETAIL: Context "Caller tuples".
2024-08-19 09:11:04.664 CEST [9317] BACKTRACE:
postgres: postgres postgres [local] SELECT(errbacktrace+0x46)
[0x55ac9ab06e56]
postgres: postgres postgres [local] SELECT(+0x66c7e5) [0x55ac9ab2f7e5]
postgres: postgres postgres [local] SELECT(palloc+0x4a) [0x55ac9ab2fd3a]
postgres: postgres postgres [local] SELECT(datumCopy+0x30)
[0x55ac9aa10cf0]
postgres: postgres postgres [local]
SELECT(tuplesort_putdatum+0x8f) [0x55ac9ab4032f]
postgres: postgres postgres [local] SELECT(+0x37bf26) [0x55ac9a83ef26]
[..]
2024-08-19 09:11:04.664 CEST [9317] STATEMENT: select id, count(*)
from zz group by id having count(*) > 1;
2024-08-19 09:11:04.664 CEST [9317] LOG: level: 0; Caller tuples:
25165712 total in 13 blocks; 1705672 free; 23460040 used
2024-08-19 09:11:04.664 CEST [9317] LOG: Grand total: 25165712 bytes
in 13 blocks; 1705672 free (0 chunks); 23460040 used
//100ms later:
2024-08-19 09:11:04.764 CEST [9317] LOG: dumping memory context stats
2024-08-19 09:11:04.764 CEST [9317] DETAIL: Context "Caller tuples".
2024-08-19 09:11:04.764 CEST [9317] BACKTRACE:
postgres: postgres postgres [local] SELECT(errbacktrace+0x46)
[0x55ac9ab06e56]
postgres: postgres postgres [local] SELECT(+0x66c7e5) [0x55ac9ab2f7e5]
postgres: postgres postgres [local] SELECT(palloc+0x4a) [0x55ac9ab2fd3a]
postgres: postgres postgres [local] SELECT(datumCopy+0x30)
[0x55ac9aa10cf0]
postgres: postgres postgres [local]
SELECT(tuplesort_putdatum+0x8f) [0x55ac9ab4032f]
postgres: postgres postgres [local] SELECT(+0x37bf26) [0x55ac9a83ef26]
[..]
2024-08-19 09:11:04.764 CEST [9317] STATEMENT: select id, count(*)
from zz group by id having count(*) > 1;
2024-08-19 09:11:04.765 CEST [9317] LOG: level: 0; Caller tuples:
41942928 total in 15 blocks; 7015448 free; 34927480 used
2024-08-19 09:11:04.765 CEST [9317] LOG: Grand total: 41942928 bytes
in 15 blocks; 7015448 free (0 chunks); 34927480 used
Attachments:
debug_palloc_context_threshold_v1.patchapplication/octet-stream; name=debug_palloc_context_threshold_v1.patchDownload
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..e1b4a4f56b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -128,6 +128,7 @@ int IntervalStyle = INTSTYLE_POSTGRES;
bool enableFsync = true;
bool allowSystemTableMods = false;
int work_mem = 4096;
+int debug_palloc_context_threshold = 0;
double hash_mem_multiplier = 2.0;
int maintenance_work_mem = 65536;
int max_parallel_maintenance_workers = 2;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 79ecaa4c4c..53ab643d3a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2508,6 +2508,18 @@ struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"debug_palloc_context_threshold", PGC_USERSET, RESOURCES_MEM,
+ gettext_noop("Dump memory context usage stats when using more than this memory in context."),
+ gettext_noop("Dump MemoryContextStats() when palloc() ends up "
+ "with more than context->mem_used than defined here"),
+ GUC_UNIT_KB
+ },
+ &debug_palloc_context_threshold,
+ 0, 0, MAX_KILOBYTES,
+ NULL, NULL, NULL
+ },
+
/*
* Dynamic shared memory has a higher overhead than local memory contexts,
* so when testing low-memory scenarios that could use shared memory, the
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index bde54326c6..66959ab98c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -27,6 +27,7 @@
#include "utils/memutils.h"
#include "utils/memutils_internal.h"
#include "utils/memutils_memorychunk.h"
+#include "portability/instr_time.h"
static void BogusFree(void *pointer);
@@ -156,6 +157,9 @@ MemoryContext CurTransactionContext = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
+/* Last timestamp when there was MemoryContextStats() dump by debug_palloc_context_threshold */
+static instr_time debug_palloc_context_last_ts;
+static int recursion_level = 0;
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
@@ -1313,6 +1317,43 @@ ProcessLogMemoryContextInterrupt(void)
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
}
+/*
+ * dump_memory_debug_if_necessary
+ * Log using MemoryContextStats(), but not too often
+ *
+ */
+static void
+dump_memory_debug_if_necessary(MemoryContext context)
+{
+ /* We need to never recurse infinitley as some routines used later on
+ * may also call palloc() internally.
+ */
+ if(recursion_level >= 1)
+ return;
+ recursion_level++;
+
+ if (debug_palloc_context_threshold != 0 &&
+ context->mem_allocated * 1024 >= debug_palloc_context_threshold) {
+ instr_time now, ts_diff;
+ INSTR_TIME_SET_CURRENT(now);
+ ts_diff = now;
+ INSTR_TIME_SUBTRACT(ts_diff, debug_palloc_context_last_ts);
+
+#define MEMSTAT_DUMP_INTERVAL_MS 100
+ /* Rate limit the messages to avoid log clutter */
+ if ((double) INSTR_TIME_GET_MILLISEC(ts_diff) >= MEMSTAT_DUMP_INTERVAL_MS) {
+ ereport(LOG,
+ (errmsg("dumping memory context stats"),
+ errdetail("Context \"%s\".", context->name),
+ errbacktrace()));
+ MemoryContextStatsDetail(context, 100, 100, false);
+ debug_palloc_context_last_ts = now;
+ }
+ }
+
+ recursion_level--;
+}
+
void *
palloc(Size size)
{
@@ -1340,6 +1381,8 @@ palloc(Size size)
Assert(ret != NULL);
VALGRIND_MEMPOOL_ALLOC(context, ret, size);
+ dump_memory_debug_if_necessary(context);
+
return ret;
}
@@ -1361,6 +1404,8 @@ palloc0(Size size)
MemSetAligned(ret, 0, size);
+ dump_memory_debug_if_necessary(context);
+
return ret;
}
@@ -1387,6 +1432,8 @@ palloc_extended(Size size, int flags)
if ((flags & MCXT_ALLOC_ZERO) != 0)
MemSetAligned(ret, 0, size);
+ dump_memory_debug_if_necessary(context);
+
return ret;
}
@@ -1489,6 +1536,8 @@ MemoryContextAllocAligned(MemoryContext context,
/* Disallow access to the redirection chunk header. */
VALGRIND_MAKE_MEM_NOACCESS(alignedchunk, sizeof(MemoryChunk));
+ dump_memory_debug_if_necessary(context);
+
return aligned;
}
@@ -1660,6 +1709,8 @@ MemoryContextAllocHuge(MemoryContext context, Size size)
VALGRIND_MEMPOOL_ALLOC(context, ret, size);
+ dump_memory_debug_if_necessary(context);
+
return ret;
}
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb..cf17fbd22f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -265,6 +265,7 @@ extern PGDLLIMPORT int IntervalStyle;
extern PGDLLIMPORT bool enableFsync;
extern PGDLLIMPORT bool allowSystemTableMods;
extern PGDLLIMPORT int work_mem;
+extern PGDLLIMPORT int debug_palloc_context_threshold;
extern PGDLLIMPORT double hash_mem_multiplier;
extern PGDLLIMPORT int maintenance_work_mem;
extern PGDLLIMPORT int max_parallel_maintenance_workers;