Trim the heap free memory
Hi hackers,
Currently, all processes in PostgreSQL actually use malloc to allocate and
free memory. In the case of long connections where business queries are
executed over extended periods, the distribution of memory can become
extremely complex.
Under certain circumstances, a common issue in memory usage due to the
caching strategy of malloc may arise: even if memory is released through
the free function, it may not be returned to the OS in a timely manner.
This can lead to high system memory usage, affecting performance and the
operation of other applications, and may even result in Out-Of-Memory (OOM)
errors.
To address this issue, I have developed a new function called
pg_trim_backend_heap_free_memory, based on the existing
pg_log_backend_memory_contexts function. This function triggers the
specified process to execute the malloc_trim operation by sending signals,
thereby releasing as much unreturned memory to the operating system as
possible. This not only helps to optimize memory usage but can also
significantly enhance system performance under memory pressure.
Here is an example of using the pg_trim_backend_heap_free_memory function
to demonstrate its effect:
CREATE OR REPLACE FUNCTION public.partition_create(schemaname character
varying, numberofpartition integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
declare
currentTableId integer;
currentSchemaName varchar(100);
currentTableName varchar(100);
begin
execute 'create schema ' || schemaname;
execute 'create table ' || schemaname || '.' || schemaname || 'hashtable
(p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9
name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);';
currentTableId := 1;
loop
currentTableName := schemaname || '.' || schemaname || 'hashtable' ||
ltrim(currentTableId::varchar(10));
execute 'create table ' || currentTableName || ' PARTITION OF ' ||
schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS
' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')';
currentTableId := currentTableId + 1;
if (currentTableId > numberofpartition) then exit; end if;
end loop;
return currentTableId - 1;
END $function$;select public.partition_create('test3', 5000);
select public.partition_create('test4', 5000);
select count(*) from test4.test4hashtable a, test3.test3hashtable b where
a.p1=b.p1;
You are now about to see the memory size of the process executing the query.
postgres 68673 1.2 0.0 610456 124768 ? Ss 08:25 0:01
postgres: postgres postgres [local] idle
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51332 kB
Pss: 51332 kB
02b65000-082e5000 rw-p 00000000 00:00 0
[heap]
After use pg_trim_backend_heap_free_memory, you will see:
postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of PID
68673
pg_trim_backend_heap_free_memory
----------------------------------
t
(1 row)
02b65000-082e5000 rw-p 00000000 00:00 0
[heap]
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 4888 kB
Pss: 4888 kB
postgres 68673 1.2 0.0 610456 75244 ? Ss 08:26 0:01
postgres: postgres postgres [local] idle
Looking forward to your feedback,
Regards,
--
Shawn Wang
Now
Attachments:
trimheapfreemeory.patchapplication/octet-stream; name=trimheapfreemeory.patchDownload
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..3aa3b60288 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -754,6 +754,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_trim_backend_heap_free_memory(integer) FROM PUBLIC;
+
REVOKE EXECUTE ON FUNCTION pg_ls_logicalsnapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 7d0877c95e..d90d1649a1 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -765,6 +765,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 199f008bcd..2109f9e822 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -605,6 +605,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..4389ff3d48 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..cd72412614 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..5eb8d168e7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index daa7909382..d167e41291 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -874,6 +874,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..3251d2823e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -712,6 +712,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+ if (CheckProcSignal(PROCSIG_TRIM_HEAP_FREE_MEMORY))
+ HandleTrimHeapFreeMemoryInterrupt();
+
SetLatch(MyLatch);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..f90a557bbe 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3479,6 +3479,9 @@ ProcessInterrupts(void)
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
+
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..c213c4163a 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -305,3 +305,57 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_trim_backend_heap_free_memory
+ * Signal a backend or an auxiliary process to trim heap free memory.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to log the memory contexts.
+ */
+Datum
+pg_trim_backend_heap_free_memory(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (SendProcSignal(pid, PROCSIG_TRIM_HEAP_FREE_MEMORY, procNumber) < 0)
+ {
+ /* Again, just a warning to allow loops */
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..e90a61affe 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
+volatile sig_atomic_t TrimHeapFreeMemoryPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index 01a1fb8527..395f119d77 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -21,6 +21,7 @@ OBJS = \
generation.o \
mcxt.o \
memdebug.o \
+ memtrim.o \
portalmem.o \
slab.o
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..6186d9247d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8348,6 +8348,12 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# logging memory contexts of the specified backend
+{ oid => '4551', descr => 'Trim Heap free memory of the specified backend',
+ proname => 'pg_trim_backend_heap_free_memory', provolatile => 'v',
+ prorettype => 'bool', proargtypes => 'int4',
+ prosrc => 'pg_trim_backend_heap_free_memory' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb..5a676eeb9e 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -100,6 +100,8 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
+extern PGDLLIMPORT volatile sig_atomic_t TrimHeapFreeMemoryPending;
+
/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;
extern PGDLLIMPORT volatile uint32 QueryCancelHoldoffCount;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index f94c11a9a8..0d4a3a42a2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,6 +48,8 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+ PROCSIG_TRIM_HEAP_FREE_MEMORY, /* ask backend to release free memory from the heap */
+
NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596ff21..61d4d6d252 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -104,6 +104,9 @@ extern void MemoryContextCheck(MemoryContext context);
extern void HandleLogMemoryContextInterrupt(void);
extern void ProcessLogMemoryContextInterrupt(void);
+extern void HandleTrimHeapFreeMemoryInterrupt(void);
+extern void ProcessTrimHeapFreeMemoryInterrupt(void);
+
/*
* Memory-context-type-specific functions
*/
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 35fb72f302..9bde10b2b1 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -365,6 +365,17 @@ RESET ROLE;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
FROM regress_log_memory;
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+ pg_trim_backend_heap_free_memory
+----------------------------------
+ t
+(1 row)
+
--
-- Test some built-in SRFs
--
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index e570783453..d1cd2440ee 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -143,6 +143,14 @@ REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+
--
-- Test some built-in SRFs
--
On Fri, 23 Aug 2024 at 10:54, shawn wang <shawn.wang.pg@gmail.com> wrote:
Hi hackers,
Currently, all processes in PostgreSQL actually use malloc to allocate
and free memory. In the case of long connections where business queries are
executed over extended periods, the distribution of memory can become
extremely complex.Under certain circumstances, a common issue in memory usage due to the
caching strategy of malloc may arise: even if memory is released through
the free function, it may not be returned to the OS in a timely manner.
This can lead to high system memory usage, affecting performance and the
operation of other applications, and may even result in Out-Of-Memory (OOM)
errors.To address this issue, I have developed a new function called
pg_trim_backend_heap_free_memory, based on the existing
pg_log_backend_memory_contexts function. This function triggers the
specified process to execute the malloc_trim operation by sending
signals, thereby releasing as much unreturned memory to the operating
system as possible. This not only helps to optimize memory usage but can
also significantly enhance system performance under memory pressure.Here is an example of using the pg_trim_backend_heap_free_memory function
to demonstrate its effect:CREATE OR REPLACE FUNCTION public.partition_create(schemaname character
varying, numberofpartition integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
declare
currentTableId integer;
currentSchemaName varchar(100);
currentTableName varchar(100);
begin
execute 'create schema ' || schemaname;
execute 'create table ' || schemaname || '.' || schemaname || 'hashtable
(p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9
name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);';
currentTableId := 1;
loop
currentTableName := schemaname || '.' || schemaname || 'hashtable' ||
ltrim(currentTableId::varchar(10));
execute 'create table ' || currentTableName || ' PARTITION OF ' ||
schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS
' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')';
currentTableId := currentTableId + 1;
if (currentTableId > numberofpartition) then exit; end if;
end loop;
return currentTableId - 1;
END $function$;select public.partition_create('test3', 5000);
select public.partition_create('test4', 5000);
select count(*) from test4.test4hashtable a, test3.test3hashtable b where
a.p1=b.p1;You are now about to see the memory size of the process executing the
query.postgres 68673 1.2 0.0 610456 124768 ? Ss 08:25 0:01
postgres: postgres postgres [local] idle
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51332 kB
Pss: 51332 kB02b65000-082e5000 rw-p 00000000 00:00 0
[heap]
After use pg_trim_backend_heap_free_memory, you will see:
postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of
PID 68673
pg_trim_backend_heap_free_memory
----------------------------------
t
(1 row)
02b65000-082e5000 rw-p 00000000 00:00 0
[heap]
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 4888 kB
Pss: 4888 kBpostgres 68673 1.2 0.0 610456 75244 ? Ss 08:26 0:01
postgres: postgres postgres [local] idle
Looking forward to your feedback,
Regards,
--
Shawn WangNow
Liked the idea. Unfortunately, at the moment it is giving compilation error
--
make[4]: *** No rule to make target `memtrim.o', needed by `objfiles.txt'.
Stop.
--
Regards,
Rafia Sabih
Hi Shawn,
On Fri, Aug 23, 2024 at 2:24 PM shawn wang <shawn.wang.pg@gmail.com> wrote:
Hi hackers,
Currently, all processes in PostgreSQL actually use malloc to allocate and free memory. In the case of long connections where business queries are executed over extended periods, the distribution of memory can become extremely complex.
Under certain circumstances, a common issue in memory usage due to the caching strategy of malloc may arise: even if memory is released through the free function, it may not be returned to the OS in a timely manner. This can lead to high system memory usage, affecting performance and the operation of other applications, and may even result in Out-Of-Memory (OOM) errors.
To address this issue, I have developed a new function called pg_trim_backend_heap_free_memory, based on the existing pg_log_backend_memory_contexts function. This function triggers the specified process to execute the malloc_trim operation by sending signals, thereby releasing as much unreturned memory to the operating system as possible. This not only helps to optimize memory usage but can also significantly enhance system performance under memory pressure.
Here is an example of using the pg_trim_backend_heap_free_memory function to demonstrate its effect:
CREATE OR REPLACE FUNCTION public.partition_create(schemaname character varying, numberofpartition integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
declare
currentTableId integer;
currentSchemaName varchar(100);
currentTableName varchar(100);
begin
execute 'create schema ' || schemaname;
execute 'create table ' || schemaname || '.' || schemaname || 'hashtable (p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9 name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);';
currentTableId := 1;
loop
currentTableName := schemaname || '.' || schemaname || 'hashtable' || ltrim(currentTableId::varchar(10));
execute 'create table ' || currentTableName || ' PARTITION OF ' || schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')';
currentTableId := currentTableId + 1;
if (currentTableId > numberofpartition) then exit; end if;
end loop;
return currentTableId - 1;
END $function$;select public.partition_create('test3', 5000);
select public.partition_create('test4', 5000);
select count(*) from test4.test4hashtable a, test3.test3hashtable b where a.p1=b.p1;You are now about to see the memory size of the process executing the query.
postgres 68673 1.2 0.0 610456 124768 ? Ss 08:25 0:01 postgres: postgres postgres [local] idle
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51332 kB
Pss: 51332 kB02b65000-082e5000 rw-p 00000000 00:00 0 [heap]
After use pg_trim_backend_heap_free_memory, you will see:
postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of PID 68673
pg_trim_backend_heap_free_memory
----------------------------------
t
(1 row)
02b65000-082e5000 rw-p 00000000 00:00 0 [heap]
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 4888 kB
Pss: 4888 kBpostgres 68673 1.2 0.0 610456 75244 ? Ss 08:26 0:01 postgres: postgres postgres [local] idle
Looking forward to your feedback,
Looks useful.
How much time does malloc_trim() take to finish? Does it affect the
current database activity in that backend? It may be good to see
effect of this function by firing the function on random backends
while the query is running through pgbench.
In the patch I don't see definitions of
ProcessTrimHeapFreeMemoryInterrupt() and
HandleTrimHeapFreeMemoryInterrupt(). Am I missing something?
--
Best Wishes,
Ashutosh Bapat
Thank you Rafia. Here is a v2 patch.
Rafia Sabih <rafia.pghackers@gmail.com> 于2024年8月23日周五 18:30写道:
Show quoted text
On Fri, 23 Aug 2024 at 10:54, shawn wang <shawn.wang.pg@gmail.com> wrote:
Hi hackers,
Currently, all processes in PostgreSQL actually use malloc to allocate
and free memory. In the case of long connections where business queries are
executed over extended periods, the distribution of memory can become
extremely complex.Under certain circumstances, a common issue in memory usage due to the
caching strategy of malloc may arise: even if memory is released through
the free function, it may not be returned to the OS in a timely manner.
This can lead to high system memory usage, affecting performance and the
operation of other applications, and may even result in Out-Of-Memory (OOM)
errors.To address this issue, I have developed a new function called
pg_trim_backend_heap_free_memory, based on the existing
pg_log_backend_memory_contexts function. This function triggers the
specified process to execute the malloc_trim operation by sending
signals, thereby releasing as much unreturned memory to the operating
system as possible. This not only helps to optimize memory usage but can
also significantly enhance system performance under memory pressure.Here is an example of using the pg_trim_backend_heap_free_memory
function to demonstrate its effect:CREATE OR REPLACE FUNCTION public.partition_create(schemaname character
varying, numberofpartition integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
declare
currentTableId integer;
currentSchemaName varchar(100);
currentTableName varchar(100);
begin
execute 'create schema ' || schemaname;
execute 'create table ' || schemaname || '.' || schemaname || 'hashtable
(p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9
name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);';
currentTableId := 1;
loop
currentTableName := schemaname || '.' || schemaname || 'hashtable' ||
ltrim(currentTableId::varchar(10));
execute 'create table ' || currentTableName || ' PARTITION OF ' ||
schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS
' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')';
currentTableId := currentTableId + 1;
if (currentTableId > numberofpartition) then exit; end if;
end loop;
return currentTableId - 1;
END $function$;select public.partition_create('test3', 5000);
select public.partition_create('test4', 5000);
select count(*) from test4.test4hashtable a, test3.test3hashtable b
where a.p1=b.p1;You are now about to see the memory size of the process executing the
query.postgres 68673 1.2 0.0 610456 124768 ? Ss 08:25 0:01
postgres: postgres postgres [local] idle
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51332 kB
Pss: 51332 kB02b65000-082e5000 rw-p 00000000 00:00 0
[heap]
After use pg_trim_backend_heap_free_memory, you will see:
postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of
PID 68673
pg_trim_backend_heap_free_memory
----------------------------------
t
(1 row)
02b65000-082e5000 rw-p 00000000 00:00 0
[heap]
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 4888 kB
Pss: 4888 kBpostgres 68673 1.2 0.0 610456 75244 ? Ss 08:26 0:01
postgres: postgres postgres [local] idle
Looking forward to your feedback,
Regards,
--
Shawn WangNow
Liked the idea. Unfortunately, at the moment it is giving compilation
error --make[4]: *** No rule to make target `memtrim.o', needed by
`objfiles.txt'. Stop.
--
Regards,
Rafia Sabih
Attachments:
v2-0001-Trim-Heap-Free-Memory.patchapplication/octet-stream; name=v2-0001-Trim-Heap-Free-Memory.patchDownload
commit e20de9d029dae43b5af4469c0345d91c6b05cca3
Author: xueyou.ws <xueyou.ws@alibaba-inc.com>
Date: Sat Aug 24 10:02:28 2024 +0800
2
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..3aa3b60288 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -754,6 +754,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_trim_backend_heap_free_memory(integer) FROM PUBLIC;
+
REVOKE EXECUTE ON FUNCTION pg_ls_logicalsnapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 7d0877c95e..d90d1649a1 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -765,6 +765,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 199f008bcd..2109f9e822 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -605,6 +605,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..4389ff3d48 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..cd72412614 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..5eb8d168e7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index daa7909382..d167e41291 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -874,6 +874,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..3251d2823e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -712,6 +712,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+ if (CheckProcSignal(PROCSIG_TRIM_HEAP_FREE_MEMORY))
+ HandleTrimHeapFreeMemoryInterrupt();
+
SetLatch(MyLatch);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..f90a557bbe 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3479,6 +3479,9 @@ ProcessInterrupts(void)
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
+
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..c213c4163a 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -305,3 +305,57 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_trim_backend_heap_free_memory
+ * Signal a backend or an auxiliary process to trim heap free memory.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to log the memory contexts.
+ */
+Datum
+pg_trim_backend_heap_free_memory(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (SendProcSignal(pid, PROCSIG_TRIM_HEAP_FREE_MEMORY, procNumber) < 0)
+ {
+ /* Again, just a warning to allow loops */
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..e90a61affe 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
+volatile sig_atomic_t TrimHeapFreeMemoryPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index 01a1fb8527..395f119d77 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -21,6 +21,7 @@ OBJS = \
generation.o \
mcxt.o \
memdebug.o \
+ memtrim.o \
portalmem.o \
slab.o
diff --git a/src/backend/utils/mmgr/memtrim.c b/src/backend/utils/mmgr/memtrim.c
new file mode 100644
index 0000000000..5664eb9f80
--- /dev/null
+++ b/src/backend/utils/mmgr/memtrim.c
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * memtrim.c
+ * Declarations used in memory implementations.
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/utils/mmgr/memtrim.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <malloc.h>
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "utils/memutils.h"
+
+/*
+ * HandleTrimHeapFreeMemoryInterrupt
+ * Handle receipt of an interrupt indicating trimming of heap free
+ * memory.
+ *
+ * All the actual work is deferred to ProcessTrimHeapFreeMemoryInterrupt(),
+ * because we cannot safely precess trim inside the signal handler.
+ */
+void
+HandleTrimHeapFreeMemoryInterrupt(void)
+{
+ InterruptPending = true;
+ TrimHeapFreeMemoryPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessTrimHeapFreeMemoryInterrupt
+ * Perform trimming of heap free memory of this backend process.
+ *
+ * Any backend that participates in ProcSignal signaling must arrange
+ * to call this function if we see TrimHeapFreeMemoryPending set.
+ * It is called from CHECK_FOR_INTERRUPTS(), which is enough because
+ * the target process for trimming of heap free memory is a backend.
+ */
+void
+ProcessTrimHeapFreeMemoryInterrupt(void)
+{
+ TrimHeapFreeMemoryPending = false;
+
+ /*
+ * Use LOG_SERVER_ONLY to prevent this message from being sent to the
+ * connected client.
+ */
+ ereport(LOG_SERVER_ONLY,
+ (errhidestmt(true),
+ errhidecontext(true),
+ errmsg("trimming heap free memory of PID %d", MyProcPid)));
+
+ /*
+ * The malloc_trim() function attempts to release free memory from
+ * the heap (by calling sbrk(2) or madvise(2) with suitable
+ * arguments).
+ * The argument is 0, only the minimum amount of memory is maintained
+ * at the top of the heap (i.e., one page or less).
+ */
+ malloc_trim(0);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..6186d9247d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8348,6 +8348,12 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# logging memory contexts of the specified backend
+{ oid => '4551', descr => 'Trim Heap free memory of the specified backend',
+ proname => 'pg_trim_backend_heap_free_memory', provolatile => 'v',
+ prorettype => 'bool', proargtypes => 'int4',
+ prosrc => 'pg_trim_backend_heap_free_memory' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb..5a676eeb9e 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -100,6 +100,8 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
+extern PGDLLIMPORT volatile sig_atomic_t TrimHeapFreeMemoryPending;
+
/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;
extern PGDLLIMPORT volatile uint32 QueryCancelHoldoffCount;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index f94c11a9a8..0d4a3a42a2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,6 +48,8 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+ PROCSIG_TRIM_HEAP_FREE_MEMORY, /* ask backend to release free memory from the heap */
+
NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596ff21..61d4d6d252 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -104,6 +104,9 @@ extern void MemoryContextCheck(MemoryContext context);
extern void HandleLogMemoryContextInterrupt(void);
extern void ProcessLogMemoryContextInterrupt(void);
+extern void HandleTrimHeapFreeMemoryInterrupt(void);
+extern void ProcessTrimHeapFreeMemoryInterrupt(void);
+
/*
* Memory-context-type-specific functions
*/
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 35fb72f302..9bde10b2b1 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -365,6 +365,17 @@ RESET ROLE;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
FROM regress_log_memory;
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+ pg_trim_backend_heap_free_memory
+----------------------------------
+ t
+(1 row)
+
--
-- Test some built-in SRFs
--
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index e570783453..d1cd2440ee 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -143,6 +143,14 @@ REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+
--
-- Test some built-in SRFs
--
Hi Ashutosh, thank you for your response.
Firstly, the purpose of caching memory in malloc is for performance, so
when we execute malloc_trim(), it will affect the efficiency of memory
usage in the subsequent operation. Secondly, the function of malloc_trim()
is to lock and traverse the bins, then execute madvise on the memory that
can be released. When there is a lot of memory in the bins, the traversal
time will also increase. I once placed malloc_trim() to execute at the end
of each query, which resulted in a 20% performance drop. Therefore, I use
it as such a function. The new v2 patch has included the omitted code.
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月23日周五 20:02写道:
Show quoted text
Hi Shawn,
On Fri, Aug 23, 2024 at 2:24 PM shawn wang <shawn.wang.pg@gmail.com>
wrote:Hi hackers,
Currently, all processes in PostgreSQL actually use malloc to allocate
and free memory. In the case of long connections where business queries are
executed over extended periods, the distribution of memory can become
extremely complex.Under certain circumstances, a common issue in memory usage due to the
caching strategy of malloc may arise: even if memory is released through
the free function, it may not be returned to the OS in a timely manner.
This can lead to high system memory usage, affecting performance and the
operation of other applications, and may even result in Out-Of-Memory (OOM)
errors.To address this issue, I have developed a new function called
pg_trim_backend_heap_free_memory, based on the existing
pg_log_backend_memory_contexts function. This function triggers the
specified process to execute the malloc_trim operation by sending signals,
thereby releasing as much unreturned memory to the operating system as
possible. This not only helps to optimize memory usage but can also
significantly enhance system performance under memory pressure.Here is an example of using the pg_trim_backend_heap_free_memory
function to demonstrate its effect:
CREATE OR REPLACE FUNCTION public.partition_create(schemaname character
varying, numberofpartition integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
declare
currentTableId integer;
currentSchemaName varchar(100);
currentTableName varchar(100);
begin
execute 'create schema ' || schemaname;
execute 'create table ' || schemaname || '.' || schemaname ||'hashtable (p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8
text, p9 name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY
HASH(p1);';currentTableId := 1;
loop
currentTableName := schemaname || '.' || schemaname || 'hashtable' ||ltrim(currentTableId::varchar(10));
execute 'create table ' || currentTableName || ' PARTITION OF ' ||
schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS
' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')';currentTableId := currentTableId + 1;
if (currentTableId > numberofpartition) then exit; end if;
end loop;
return currentTableId - 1;
END $function$;select public.partition_create('test3', 5000);
select public.partition_create('test4', 5000);
select count(*) from test4.test4hashtable a, test3.test3hashtable bwhere a.p1=b.p1;
You are now about to see the memory size of the process executing the
query.
postgres 68673 1.2 0.0 610456 124768 ? Ss 08:25 0:01
postgres: postgres postgres [local] idle
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51332 kB
Pss: 51332 kB02b65000-082e5000 rw-p 00000000 00:00 0
[heap]
After use pg_trim_backend_heap_free_memory, you will see:
postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory ofPID 68673
pg_trim_backend_heap_free_memory
----------------------------------
t
(1 row)
02b65000-082e5000 rw-p 00000000 00:00 0[heap]
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 4888 kB
Pss: 4888 kBpostgres 68673 1.2 0.0 610456 75244 ? Ss 08:26 0:01
postgres: postgres postgres [local] idle
Looking forward to your feedback,
Looks useful.
How much time does malloc_trim() take to finish? Does it affect the
current database activity in that backend? It may be good to see
effect of this function by firing the function on random backends
while the query is running through pgbench.In the patch I don't see definitions of
ProcessTrimHeapFreeMemoryInterrupt() and
HandleTrimHeapFreeMemoryInterrupt(). Am I missing something?--
Best Wishes,
Ashutosh Bapat
Hi Shawn,
It will be good to document usage of this function. Please add
document changes in your patch. We need to document the impact of this
function so that users can judiciously decide whether or not to use
this function and under what conditions. Also they would know what to
expect when they use this function.
Running it after a query finishes is one thing but that can't be
guaranteed because of the asynchronous nature of signal handlers.
malloc_trim() may be called while a query is being executed. We need
to assess that impact as well.
Can you please share some numbers - TPS, latency etc. with and without
this function invoked during a benchmark run?
--
Best Wishes,
Ashutosh Bapat
Show quoted text
On Sat, Aug 24, 2024 at 8:12 AM shawn wang <shawn.wang.pg@gmail.com> wrote:
Hi Ashutosh, thank you for your response.
Firstly, the purpose of caching memory in malloc is for performance, so when we execute malloc_trim(), it will affect the efficiency of memory usage in the subsequent operation. Secondly, the function of malloc_trim() is to lock and traverse the bins, then execute madvise on the memory that can be released. When there is a lot of memory in the bins, the traversal time will also increase. I once placed malloc_trim() to execute at the end of each query, which resulted in a 20% performance drop. Therefore, I use it as such a function. The new v2 patch has included the omitted code.Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月23日周五 20:02写道:
Hi Shawn,
On Fri, Aug 23, 2024 at 2:24 PM shawn wang <shawn.wang.pg@gmail.com> wrote:
Hi hackers,
Currently, all processes in PostgreSQL actually use malloc to allocate and free memory. In the case of long connections where business queries are executed over extended periods, the distribution of memory can become extremely complex.
Under certain circumstances, a common issue in memory usage due to the caching strategy of malloc may arise: even if memory is released through the free function, it may not be returned to the OS in a timely manner. This can lead to high system memory usage, affecting performance and the operation of other applications, and may even result in Out-Of-Memory (OOM) errors.
To address this issue, I have developed a new function called pg_trim_backend_heap_free_memory, based on the existing pg_log_backend_memory_contexts function. This function triggers the specified process to execute the malloc_trim operation by sending signals, thereby releasing as much unreturned memory to the operating system as possible. This not only helps to optimize memory usage but can also significantly enhance system performance under memory pressure.
Here is an example of using the pg_trim_backend_heap_free_memory function to demonstrate its effect:
CREATE OR REPLACE FUNCTION public.partition_create(schemaname character varying, numberofpartition integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
declare
currentTableId integer;
currentSchemaName varchar(100);
currentTableName varchar(100);
begin
execute 'create schema ' || schemaname;
execute 'create table ' || schemaname || '.' || schemaname || 'hashtable (p1 text, p2 text, p3 text, p4 int, p5 int, p6 int, p7 int, p8 text, p9 name, p10 varchar, p11 text, p12 text, p13 text) PARTITION BY HASH(p1);';
currentTableId := 1;
loop
currentTableName := schemaname || '.' || schemaname || 'hashtable' || ltrim(currentTableId::varchar(10));
execute 'create table ' || currentTableName || ' PARTITION OF ' || schemaname || '.' || schemaname || 'hashtable' || ' FOR VALUES WITH(MODULUS ' || numberofpartition || ', REMAINDER ' || currentTableId - 1 || ')';
currentTableId := currentTableId + 1;
if (currentTableId > numberofpartition) then exit; end if;
end loop;
return currentTableId - 1;
END $function$;select public.partition_create('test3', 5000);
select public.partition_create('test4', 5000);
select count(*) from test4.test4hashtable a, test3.test3hashtable b where a.p1=b.p1;You are now about to see the memory size of the process executing the query.
postgres 68673 1.2 0.0 610456 124768 ? Ss 08:25 0:01 postgres: postgres postgres [local] idle
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51332 kB
Pss: 51332 kB02b65000-082e5000 rw-p 00000000 00:00 0 [heap]
After use pg_trim_backend_heap_free_memory, you will see:
postgres=# select pg_trim_backend_heap_free_memory(pg_backend_pid());
2024-08-23 08:27:53.958 UTC [68673] LOG: trimming heap free memory of PID 68673
pg_trim_backend_heap_free_memory
----------------------------------
t
(1 row)
02b65000-082e5000 rw-p 00000000 00:00 0 [heap]
Size: 89600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 4888 kB
Pss: 4888 kBpostgres 68673 1.2 0.0 610456 75244 ? Ss 08:26 0:01 postgres: postgres postgres [local] idle
Looking forward to your feedback,
Looks useful.
How much time does malloc_trim() take to finish? Does it affect the
current database activity in that backend? It may be good to see
effect of this function by firing the function on random backends
while the query is running through pgbench.In the patch I don't see definitions of
ProcessTrimHeapFreeMemoryInterrupt() and
HandleTrimHeapFreeMemoryInterrupt(). Am I missing something?--
Best Wishes,
Ashutosh Bapat
Hi Ashutosh,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月26日周一 19:05写道:
Hi Shawn,
It will be good to document usage of this function. Please add
document changes in your patch. We need to document the impact of this
function so that users can judiciously decide whether or not to use
this function and under what conditions. Also they would know what to
expect when they use this function.
I have already incorporated the usage of this function into the new patch.
Currently, there is no memory information that can be extremely accurate to
reflect whether a trim operation should be performed. Here are two
conditions
that can be used as references:
1. Check the difference between the process's memory usage (for example,
the top command, due to the relationship with shared memory, it is necessary
to subtract SHR from RES) and the statistics of the memory context. If the
difference is very large, this function should be used to release memory;
2. Execute malloc_stats(). If the system bytes are greater than the
in-use bytes, this indicates that this function can be used to release
memory.
Running it after a query finishes is one thing but that can't be
guaranteed because of the asynchronous nature of signal handlers.
malloc_trim() may be called while a query is being executed. We need
to assess that impact as well.Can you please share some numbers - TPS, latency etc. with and without
this function invoked during a benchmark run?
I have placed malloc_trim() at the end of the exec_simple_query function,
so that malloc_trim() is executed once for each SQL statement executed. I
used pgbench to reproduce the performance impact,
and the results are as follows.
*Database preparation:*
create database testc;
create user t1;
alter database testc owner to t1;
./pgbench testc -U t1 -i -s 100
./pgbench testc -U t1 -S -c 100 -j 100 -T 600
*Without Trim*:
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 551984376
number of failed transactions: 0 (0.000%)
latency average = 0.109 ms
initial connection time = 23.569 ms
tps = 920001.842189 (without initial connection time)
*With Trim :*
Show quoted text
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 470690787
number of failed transactions: 0 (0.000%)
latency average = 0.127 ms
initial connection time = 23.632 ms
tps = 784511.901558 (without initial connection time)
Attachments:
v3-0001-Trim-the-free-heap-Memory.patchapplication/x-patch; name=v3-0001-Trim-the-free-heap-Memory.patchDownload
From 6d286d506ba5ed2dff012537766e7874952413ac Mon Sep 17 00:00:00 2001
From: Shawn Wang <shawn.wang.pg@gmail.com>
Date: Wed, 28 Aug 2024 18:20:57 +0800
Subject: [PATCH] Trim the free heap Memory.
ALL processes in PostgreSQL actually use malloc to allocate and free memory.
In the case of long connections where business queries are executed over extended
periods, the distribution of memory can become extremely complex.
Under certain circumstances, a common issue in memory usage due to the caching
strategy of malloc may arise: even if memory is released through the free function,
it may not be returned to the OS in a timely manner. This can lead to high system
memory usage, affecting performance and the operation of other applications, and
may even result in Out-Of-Memory (OOM) errors.
Examine the difference between the memory usage of a process (for example, using
the top command, where due to shared memory, it is necessary to subtract SHR from RES)
and the statistics of the memory context. If the difference is very large, or after
executing malloc_stats(), if the system bytes are greater than the in-use bytes,
this indicates that this process needs to release the free heap memory.
---
doc/src/sgml/func.sgml | 22 +++++++
src/backend/catalog/system_functions.sql | 2 +
src/backend/postmaster/autovacuum.c | 4 ++
src/backend/postmaster/checkpointer.c | 4 ++
src/backend/postmaster/interrupt.c | 4 ++
src/backend/postmaster/pgarch.c | 4 ++
src/backend/postmaster/startup.c | 4 ++
src/backend/postmaster/walsummarizer.c | 4 ++
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
src/backend/utils/adt/mcxtfuncs.c | 54 +++++++++++++++
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/Makefile | 1 +
src/backend/utils/mmgr/memtrim.c | 69 ++++++++++++++++++++
src/include/catalog/pg_proc.dat | 6 ++
src/include/miscadmin.h | 2 +
src/include/storage/procsignal.h | 2 +
src/include/utils/memutils.h | 3 +
src/test/regress/expected/misc_functions.out | 11 ++++
src/test/regress/sql/misc_functions.sql | 8 +++
20 files changed, 211 insertions(+)
create mode 100644 src/backend/utils/mmgr/memtrim.c
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 461fc3f437..8b396371c8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28321,6 +28321,28 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_trim_backend_heap_free_memory</primary>
+ </indexterm>
+ <function>pg_trim_backend_heap_free_memory</function> ( <parameter>pid</parameter> <type>integer</type> )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Release the free heap memory in a specified process that has not been reclaimed by the Operating System.
+ When a process's memory usage significantly exceeds its Memory Context size, there may be memory that has
+ been freed but not yet returned to the operating system. In such cases, this function can be called to
+ forcibly release this memory. This function will send requests to backend and auxiliary processes,
+ excluding the logging process. The relevant processes will execute the malloc_trim() function during
+ interrupt handling to forcibly release free memory. If the signal is successfully sent, the function
+ will return <literal>true</literal>; if the sending fails, it will return <literal>false</literal>.
+ Additionally, the usage of this function will be logged to facilitate tracking of the process's memory
+ trimming operations.
+ </para></entry>
+ </row>
+
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..3aa3b60288 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -754,6 +754,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_trim_backend_heap_free_memory(integer) FROM PUBLIC;
+
REVOKE EXECUTE ON FUNCTION pg_ls_logicalsnapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 7d0877c95e..d90d1649a1 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -765,6 +765,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 199f008bcd..2109f9e822 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -605,6 +605,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..4389ff3d48 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..cd72412614 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..5eb8d168e7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index daa7909382..d167e41291 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -874,6 +874,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..3251d2823e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -712,6 +712,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+ if (CheckProcSignal(PROCSIG_TRIM_HEAP_FREE_MEMORY))
+ HandleTrimHeapFreeMemoryInterrupt();
+
SetLatch(MyLatch);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..f90a557bbe 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3479,6 +3479,9 @@ ProcessInterrupts(void)
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
+
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..c213c4163a 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -305,3 +305,57 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_trim_backend_heap_free_memory
+ * Signal a backend or an auxiliary process to trim heap free memory.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to log the memory contexts.
+ */
+Datum
+pg_trim_backend_heap_free_memory(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (SendProcSignal(pid, PROCSIG_TRIM_HEAP_FREE_MEMORY, procNumber) < 0)
+ {
+ /* Again, just a warning to allow loops */
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..e90a61affe 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
+volatile sig_atomic_t TrimHeapFreeMemoryPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index 01a1fb8527..395f119d77 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -21,6 +21,7 @@ OBJS = \
generation.o \
mcxt.o \
memdebug.o \
+ memtrim.o \
portalmem.o \
slab.o
diff --git a/src/backend/utils/mmgr/memtrim.c b/src/backend/utils/mmgr/memtrim.c
new file mode 100644
index 0000000000..5664eb9f80
--- /dev/null
+++ b/src/backend/utils/mmgr/memtrim.c
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * memtrim.c
+ * Declarations used in memory implementations.
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/utils/mmgr/memtrim.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <malloc.h>
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "utils/memutils.h"
+
+/*
+ * HandleTrimHeapFreeMemoryInterrupt
+ * Handle receipt of an interrupt indicating trimming of heap free
+ * memory.
+ *
+ * All the actual work is deferred to ProcessTrimHeapFreeMemoryInterrupt(),
+ * because we cannot safely precess trim inside the signal handler.
+ */
+void
+HandleTrimHeapFreeMemoryInterrupt(void)
+{
+ InterruptPending = true;
+ TrimHeapFreeMemoryPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessTrimHeapFreeMemoryInterrupt
+ * Perform trimming of heap free memory of this backend process.
+ *
+ * Any backend that participates in ProcSignal signaling must arrange
+ * to call this function if we see TrimHeapFreeMemoryPending set.
+ * It is called from CHECK_FOR_INTERRUPTS(), which is enough because
+ * the target process for trimming of heap free memory is a backend.
+ */
+void
+ProcessTrimHeapFreeMemoryInterrupt(void)
+{
+ TrimHeapFreeMemoryPending = false;
+
+ /*
+ * Use LOG_SERVER_ONLY to prevent this message from being sent to the
+ * connected client.
+ */
+ ereport(LOG_SERVER_ONLY,
+ (errhidestmt(true),
+ errhidecontext(true),
+ errmsg("trimming heap free memory of PID %d", MyProcPid)));
+
+ /*
+ * The malloc_trim() function attempts to release free memory from
+ * the heap (by calling sbrk(2) or madvise(2) with suitable
+ * arguments).
+ * The argument is 0, only the minimum amount of memory is maintained
+ * at the top of the heap (i.e., one page or less).
+ */
+ malloc_trim(0);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..6186d9247d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8348,6 +8348,12 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# logging memory contexts of the specified backend
+{ oid => '4551', descr => 'Trim Heap free memory of the specified backend',
+ proname => 'pg_trim_backend_heap_free_memory', provolatile => 'v',
+ prorettype => 'bool', proargtypes => 'int4',
+ prosrc => 'pg_trim_backend_heap_free_memory' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb..5a676eeb9e 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -100,6 +100,8 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
+extern PGDLLIMPORT volatile sig_atomic_t TrimHeapFreeMemoryPending;
+
/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;
extern PGDLLIMPORT volatile uint32 QueryCancelHoldoffCount;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index f94c11a9a8..0d4a3a42a2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,6 +48,8 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+ PROCSIG_TRIM_HEAP_FREE_MEMORY, /* ask backend to release free memory from the heap */
+
NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596ff21..61d4d6d252 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -104,6 +104,9 @@ extern void MemoryContextCheck(MemoryContext context);
extern void HandleLogMemoryContextInterrupt(void);
extern void ProcessLogMemoryContextInterrupt(void);
+extern void HandleTrimHeapFreeMemoryInterrupt(void);
+extern void ProcessTrimHeapFreeMemoryInterrupt(void);
+
/*
* Memory-context-type-specific functions
*/
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 35fb72f302..9bde10b2b1 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -365,6 +365,17 @@ RESET ROLE;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
FROM regress_log_memory;
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+ pg_trim_backend_heap_free_memory
+----------------------------------
+ t
+(1 row)
+
--
-- Test some built-in SRFs
--
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index e570783453..d1cd2440ee 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -143,6 +143,14 @@ REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+
--
-- Test some built-in SRFs
--
--
2.39.3
Unfortunately, I still see a compiling issue with this patch,
memtrim.c:15:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
^~~~~~~~~~
1 error generated.
On Wed, 28 Aug 2024 at 12:54, shawn wang <shawn.wang.pg@gmail.com> wrote:
Hi Ashutosh,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月26日周一 19:05写道:
Hi Shawn,
It will be good to document usage of this function. Please add
document changes in your patch. We need to document the impact of this
function so that users can judiciously decide whether or not to use
this function and under what conditions. Also they would know what to
expect when they use this function.I have already incorporated the usage of this function into the new patch.
Currently, there is no memory information that can be extremely accurate to
reflect whether a trim operation should be performed. Here are two
conditions
that can be used as references:
1. Check the difference between the process's memory usage (for example,
the top command, due to the relationship with shared memory, it is
necessary
to subtract SHR from RES) and the statistics of the memory context. If the
difference is very large, this function should be used to release memory;
2. Execute malloc_stats(). If the system bytes are greater than the
in-use bytes, this indicates that this function can be used to release
memory.Running it after a query finishes is one thing but that can't be
guaranteed because of the asynchronous nature of signal handlers.
malloc_trim() may be called while a query is being executed. We need
to assess that impact as well.Can you please share some numbers - TPS, latency etc. with and without
this function invoked during a benchmark run?I have placed malloc_trim() at the end of the exec_simple_query function,
so that malloc_trim() is executed once for each SQL statement executed. I
used pgbench to reproduce the performance impact,
and the results are as follows.
*Database preparation:*create database testc;
create user t1;
alter database testc owner to t1;
./pgbench testc -U t1 -i -s 100
./pgbench testc -U t1 -S -c 100 -j 100 -T 600*Without Trim*:
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 551984376
number of failed transactions: 0 (0.000%)
latency average = 0.109 ms
initial connection time = 23.569 ms
tps = 920001.842189 (without initial connection time)*With Trim :*
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 470690787
number of failed transactions: 0 (0.000%)
latency average = 0.127 ms
initial connection time = 23.632 ms
tps = 784511.901558 (without initial connection time)
--
Regards,
Rafia Sabih
Hi Rafia,
I have made the necessary adjustment by replacing the inclusion of malloc.h
with stdlib.h in the relevant codebase. This change should address the
previous concerns regarding memory allocation functions.
Could you please perform another round of testing to ensure that everything
is functioning as expected with this modification?
Thank you for your assistance.
Best regards, Shawn
Rafia Sabih <rafia.pghackers@gmail.com> 于2024年9月11日周三 18:25写道:
Show quoted text
Unfortunately, I still see a compiling issue with this patch,
memtrim.c:15:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
^~~~~~~~~~
1 error generated.On Wed, 28 Aug 2024 at 12:54, shawn wang <shawn.wang.pg@gmail.com> wrote:
Hi Ashutosh,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> 于2024年8月26日周一 19:05写道:
Hi Shawn,
It will be good to document usage of this function. Please add
document changes in your patch. We need to document the impact of this
function so that users can judiciously decide whether or not to use
this function and under what conditions. Also they would know what to
expect when they use this function.I have already incorporated the usage of this function into the new patch.
Currently, there is no memory information that can be extremely accurate
to
reflect whether a trim operation should be performed. Here are two
conditions
that can be used as references:
1. Check the difference between the process's memory usage (for example,
the top command, due to the relationship with shared memory, it is
necessary
to subtract SHR from RES) and the statistics of the memory context. If the
difference is very large, this function should be used to release memory;
2. Execute malloc_stats(). If the system bytes are greater than the
in-use bytes, this indicates that this function can be used to release
memory.Running it after a query finishes is one thing but that can't be
guaranteed because of the asynchronous nature of signal handlers.
malloc_trim() may be called while a query is being executed. We need
to assess that impact as well.Can you please share some numbers - TPS, latency etc. with and without
this function invoked during a benchmark run?I have placed malloc_trim() at the end of the exec_simple_query function,
so that malloc_trim() is executed once for each SQL statement executed. I
used pgbench to reproduce the performance impact,
and the results are as follows.
*Database preparation:*create database testc;
create user t1;
alter database testc owner to t1;
./pgbench testc -U t1 -i -s 100
./pgbench testc -U t1 -S -c 100 -j 100 -T 600*Without Trim*:
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 551984376
number of failed transactions: 0 (0.000%)
latency average = 0.109 ms
initial connection time = 23.569 ms
tps = 920001.842189 (without initial connection time)*With Trim :*
$./pgbench testc -U t1 -S -c 100 -j 100 -T 600
pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 470690787
number of failed transactions: 0 (0.000%)
latency average = 0.127 ms
initial connection time = 23.632 ms
tps = 784511.901558 (without initial connection time)--
Regards,
Rafia Sabih
Attachments:
v4-0001-Trim-the-free-heap-Memory.patchapplication/octet-stream; name=v4-0001-Trim-the-free-heap-Memory.patchDownload
From 872c7c6925a99125fecf87e1598647ce1f76077b Mon Sep 17 00:00:00 2001
From: Shawn Wang <shawn.wang.pg@gmail.com>
Date: Wed, 28 Aug 2024 18:20:57 +0800
Subject: [PATCH] Trim the free heap Memory.
ALL processes in PostgreSQL actually use malloc to allocate and free memory.
In the case of long connections where business queries are executed over extended
periods, the distribution of memory can become extremely complex.
Under certain circumstances, a common issue in memory usage due to the caching
strategy of malloc may arise: even if memory is released through the free function,
it may not be returned to the OS in a timely manner. This can lead to high system
memory usage, affecting performance and the operation of other applications, and
may even result in Out-Of-Memory (OOM) errors.
Examine the difference between the memory usage of a process (for example, using
the top command, where due to shared memory, it is necessary to subtract SHR from RES)
and the statistics of the memory context. If the difference is very large, or after
executing malloc_stats(), if the system bytes are greater than the in-use bytes,
this indicates that this process needs to release the free heap memory.
---
doc/src/sgml/func.sgml | 22 +++++++
src/backend/catalog/system_functions.sql | 2 +
src/backend/postmaster/autovacuum.c | 4 ++
src/backend/postmaster/checkpointer.c | 4 ++
src/backend/postmaster/interrupt.c | 4 ++
src/backend/postmaster/pgarch.c | 4 ++
src/backend/postmaster/startup.c | 4 ++
src/backend/postmaster/walsummarizer.c | 4 ++
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
src/backend/utils/adt/mcxtfuncs.c | 54 +++++++++++++++
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/Makefile | 1 +
src/backend/utils/mmgr/memtrim.c | 69 ++++++++++++++++++++
src/include/catalog/pg_proc.dat | 6 ++
src/include/miscadmin.h | 2 +
src/include/storage/procsignal.h | 2 +
src/include/utils/memutils.h | 3 +
src/test/regress/expected/misc_functions.out | 11 ++++
src/test/regress/sql/misc_functions.sql | 8 +++
20 files changed, 211 insertions(+)
create mode 100644 src/backend/utils/mmgr/memtrim.c
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 461fc3f437..8b396371c8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28321,6 +28321,28 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_trim_backend_heap_free_memory</primary>
+ </indexterm>
+ <function>pg_trim_backend_heap_free_memory</function> ( <parameter>pid</parameter> <type>integer</type> )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Release the free heap memory in a specified process that has not been reclaimed by the Operating System.
+ When a process's memory usage significantly exceeds its Memory Context size, there may be memory that has
+ been freed but not yet returned to the operating system. In such cases, this function can be called to
+ forcibly release this memory. This function will send requests to backend and auxiliary processes,
+ excluding the logging process. The relevant processes will execute the malloc_trim() function during
+ interrupt handling to forcibly release free memory. If the signal is successfully sent, the function
+ will return <literal>true</literal>; if the sending fails, it will return <literal>false</literal>.
+ Additionally, the usage of this function will be logged to facilitate tracking of the process's memory
+ trimming operations.
+ </para></entry>
+ </row>
+
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..3aa3b60288 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -754,6 +754,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_trim_backend_heap_free_memory(integer) FROM PUBLIC;
+
REVOKE EXECUTE ON FUNCTION pg_ls_logicalsnapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 7d0877c95e..d90d1649a1 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -765,6 +765,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 199f008bcd..2109f9e822 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -605,6 +605,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..4389ff3d48 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..cd72412614 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..5eb8d168e7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index daa7909382..d167e41291 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -874,6 +874,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..3251d2823e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -712,6 +712,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+ if (CheckProcSignal(PROCSIG_TRIM_HEAP_FREE_MEMORY))
+ HandleTrimHeapFreeMemoryInterrupt();
+
SetLatch(MyLatch);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..f90a557bbe 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3479,6 +3479,9 @@ ProcessInterrupts(void)
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
+
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..c213c4163a 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -305,3 +305,57 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_trim_backend_heap_free_memory
+ * Signal a backend or an auxiliary process to trim heap free memory.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to log the memory contexts.
+ */
+Datum
+pg_trim_backend_heap_free_memory(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (SendProcSignal(pid, PROCSIG_TRIM_HEAP_FREE_MEMORY, procNumber) < 0)
+ {
+ /* Again, just a warning to allow loops */
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..e90a61affe 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
+volatile sig_atomic_t TrimHeapFreeMemoryPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index 01a1fb8527..395f119d77 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -21,6 +21,7 @@ OBJS = \
generation.o \
mcxt.o \
memdebug.o \
+ memtrim.o \
portalmem.o \
slab.o
diff --git a/src/backend/utils/mmgr/memtrim.c b/src/backend/utils/mmgr/memtrim.c
new file mode 100644
index 0000000000..093b493fe0
--- /dev/null
+++ b/src/backend/utils/mmgr/memtrim.c
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * memtrim.c
+ * Declarations used in memory implementations.
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/utils/mmgr/memtrim.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <stdlib.h>
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "utils/memutils.h"
+
+/*
+ * HandleTrimHeapFreeMemoryInterrupt
+ * Handle receipt of an interrupt indicating trimming of heap free
+ * memory.
+ *
+ * All the actual work is deferred to ProcessTrimHeapFreeMemoryInterrupt(),
+ * because we cannot safely precess trim inside the signal handler.
+ */
+void
+HandleTrimHeapFreeMemoryInterrupt(void)
+{
+ InterruptPending = true;
+ TrimHeapFreeMemoryPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessTrimHeapFreeMemoryInterrupt
+ * Perform trimming of heap free memory of this backend process.
+ *
+ * Any backend that participates in ProcSignal signaling must arrange
+ * to call this function if we see TrimHeapFreeMemoryPending set.
+ * It is called from CHECK_FOR_INTERRUPTS(), which is enough because
+ * the target process for trimming of heap free memory is a backend.
+ */
+void
+ProcessTrimHeapFreeMemoryInterrupt(void)
+{
+ TrimHeapFreeMemoryPending = false;
+
+ /*
+ * Use LOG_SERVER_ONLY to prevent this message from being sent to the
+ * connected client.
+ */
+ ereport(LOG_SERVER_ONLY,
+ (errhidestmt(true),
+ errhidecontext(true),
+ errmsg("trimming heap free memory of PID %d", MyProcPid)));
+
+ /*
+ * The malloc_trim() function attempts to release free memory from
+ * the heap (by calling sbrk(2) or madvise(2) with suitable
+ * arguments).
+ * The argument is 0, only the minimum amount of memory is maintained
+ * at the top of the heap (i.e., one page or less).
+ */
+ malloc_trim(0);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..6186d9247d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8348,6 +8348,12 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# logging memory contexts of the specified backend
+{ oid => '4551', descr => 'Trim Heap free memory of the specified backend',
+ proname => 'pg_trim_backend_heap_free_memory', provolatile => 'v',
+ prorettype => 'bool', proargtypes => 'int4',
+ prosrc => 'pg_trim_backend_heap_free_memory' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb..5a676eeb9e 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -100,6 +100,8 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
+extern PGDLLIMPORT volatile sig_atomic_t TrimHeapFreeMemoryPending;
+
/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;
extern PGDLLIMPORT volatile uint32 QueryCancelHoldoffCount;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index f94c11a9a8..0d4a3a42a2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,6 +48,8 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+ PROCSIG_TRIM_HEAP_FREE_MEMORY, /* ask backend to release free memory from the heap */
+
NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596ff21..61d4d6d252 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -104,6 +104,9 @@ extern void MemoryContextCheck(MemoryContext context);
extern void HandleLogMemoryContextInterrupt(void);
extern void ProcessLogMemoryContextInterrupt(void);
+extern void HandleTrimHeapFreeMemoryInterrupt(void);
+extern void ProcessTrimHeapFreeMemoryInterrupt(void);
+
/*
* Memory-context-type-specific functions
*/
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 35fb72f302..9bde10b2b1 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -365,6 +365,17 @@ RESET ROLE;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
FROM regress_log_memory;
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+ pg_trim_backend_heap_free_memory
+----------------------------------
+ t
+(1 row)
+
--
-- Test some built-in SRFs
--
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index e570783453..d1cd2440ee 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -143,6 +143,14 @@ REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+
--
-- Test some built-in SRFs
--
--
2.39.3
On Thu, 12 Sept 2024 at 14:40, shawn wang <shawn.wang.pg@gmail.com> wrote:
Could you please perform another round of testing to ensure that everything is functioning as expected with this modification?
One way to get a few machines with various build systems testing this
is to register the patch on the commitfest app in [1]https://commitfest.postgresql.org/50/. You can then
see if the patch is passing the continuous integration tests in [2]http://cfbot.cputube.org/.
One day soon the features of [2]http://cfbot.cputube.org/ should be combined with [1]https://commitfest.postgresql.org/50/.
David
[1]: https://commitfest.postgresql.org/50/
[2]: http://cfbot.cputube.org/
Thank you for your valuable suggestion.
I have successfully registered my patch for the commitfest. However, upon
integration, I encountered several errors during the testing phase. I am
currently investigating the root causes of these issues and will work on
providing the necessary fixes. If you have any further insights or
recommendations, I would greatly appreciate your guidance.
Thank you once again for your support.
Best regards, Shawn
David Rowley <dgrowleyml@gmail.com> 于2024年9月12日周四 16:42写道:
Show quoted text
On Thu, 12 Sept 2024 at 14:40, shawn wang <shawn.wang.pg@gmail.com> wrote:
Could you please perform another round of testing to ensure that
everything is functioning as expected with this modification?
One way to get a few machines with various build systems testing this
is to register the patch on the commitfest app in [1]. You can then
see if the patch is passing the continuous integration tests in [2].
One day soon the features of [2] should be combined with [1].David
[1] https://commitfest.postgresql.org/50/
[2] http://cfbot.cputube.org/
Attachments:
v5-0001-PATCH-Trim-the-free-heap-Memory.patchapplication/octet-stream; name=v5-0001-PATCH-Trim-the-free-heap-Memory.patchDownload
From 1e18ede185b3edd55dbe9329dad9465a4800de03 Mon Sep 17 00:00:00 2001
From: Shawn Wang <shawn.wang.pg@gmail.com>
Date: Mon, 16 Sep 2024 01:34:01 +0800
Subject: [PATCH] [PATCH] Trim the free heap Memory.
ALL processes in PostgreSQL actually use malloc to allocate and free memory.
In the case of long connections where business queries are executed over extended
periods, the distribution of memory can become extremely complex.
Under certain circumstances, a common issue in memory usage due to the caching
strategy of malloc may arise: even if memory is released through the free function,
it may not be returned to the OS in a timely manner. This can lead to high system
memory usage, affecting performance and the operation of other applications, and
may even result in Out-Of-Memory (OOM) errors.
Examine the difference between the memory usage of a process (for example, using
the top command, where due to shared memory, it is necessary to subtract SHR from RES)
and the statistics of the memory context. If the difference is very large, or after
executing malloc_stats(), if the system bytes are greater than the in-use bytes,
this indicates that this process needs to release the free heap memory.
---
doc/src/sgml/func.sgml | 22 +++++++
src/backend/catalog/system_functions.sql | 2 +
src/backend/postmaster/autovacuum.c | 4 ++
src/backend/postmaster/checkpointer.c | 4 ++
src/backend/postmaster/interrupt.c | 4 ++
src/backend/postmaster/pgarch.c | 4 ++
src/backend/postmaster/startup.c | 4 ++
src/backend/postmaster/walsummarizer.c | 4 ++
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/tcop/postgres.c | 3 +
src/backend/utils/adt/mcxtfuncs.c | 54 +++++++++++++++
src/backend/utils/init/globals.c | 1 +
src/backend/utils/mmgr/Makefile | 1 +
src/backend/utils/mmgr/memtrim.c | 69 ++++++++++++++++++++
src/backend/utils/mmgr/meson.build | 1 +
src/include/catalog/pg_proc.dat | 6 ++
src/include/miscadmin.h | 2 +
src/include/storage/procsignal.h | 2 +
src/include/utils/memutils.h | 3 +
src/test/regress/expected/misc_functions.out | 11 ++++
src/test/regress/sql/misc_functions.sql | 8 +++
21 files changed, 212 insertions(+)
create mode 100644 src/backend/utils/mmgr/memtrim.c
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 84eb3a45ee..20160f8163 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28342,6 +28342,28 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
<literal>false</literal> is returned.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_trim_backend_heap_free_memory</primary>
+ </indexterm>
+ <function>pg_trim_backend_heap_free_memory</function> ( <parameter>pid</parameter> <type>integer</type> )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Release the free heap memory in a specified process that has not been reclaimed by the Operating System.
+ When a process's memory usage significantly exceeds its Memory Context size, there may be memory that has
+ been freed but not yet returned to the operating system. In such cases, this function can be called to
+ forcibly release this memory. This function will send requests to backend and auxiliary processes,
+ excluding the logging process. The relevant processes will execute the malloc_trim() function during
+ interrupt handling to forcibly release free memory. If the signal is successfully sent, the function
+ will return <literal>true</literal>; if the sending fails, it will return <literal>false</literal>.
+ Additionally, the usage of this function will be logged to facilitate tracking of the process's memory
+ trimming operations.
+ </para></entry>
+ </row>
+
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..3aa3b60288 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -754,6 +754,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_trim_backend_heap_free_memory(integer) FROM PUBLIC;
+
REVOKE EXECUTE ON FUNCTION pg_ls_logicalsnapdir() FROM PUBLIC;
REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 7d0877c95e..d90d1649a1 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -765,6 +765,10 @@ HandleAutoVacLauncherInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
/* Process sinval catchup interrupts that happened while sleeping */
ProcessCatchupInterrupt();
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index eeb73c8572..92e1b462e2 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -605,6 +605,10 @@ HandleCheckpointerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index eedc0980cf..4389ff3d48 100644
--- a/src/backend/postmaster/interrupt.c
+++ b/src/backend/postmaster/interrupt.c
@@ -48,6 +48,10 @@ HandleMainLoopInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 02f91431f5..cd72412614 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -865,6 +865,10 @@ HandlePgArchInterrupts(void)
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
+
if (ConfigReloadPending)
{
char *archiveLib = pstrdup(XLogArchiveLibrary);
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index ef6f98ebcd..5eb8d168e7 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -192,6 +192,10 @@ HandleStartupProcInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index c1bf4a70dd..294b74aa20 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -876,6 +876,10 @@ HandleWalSummarizerInterrupts(void)
/* Perform logging of memory contexts of this process */
if (LogMemoryContextPending)
ProcessLogMemoryContextInterrupt();
+
+ /* Perform trimming heap free memory of this process */
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 87027f27eb..3251d2823e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -712,6 +712,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+ if (CheckProcSignal(PROCSIG_TRIM_HEAP_FREE_MEMORY))
+ HandleTrimHeapFreeMemoryInterrupt();
+
SetLatch(MyLatch);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..f90a557bbe 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3479,6 +3479,9 @@ ProcessInterrupts(void)
if (ParallelApplyMessagePending)
HandleParallelApplyMessages();
+
+ if (TrimHeapFreeMemoryPending)
+ ProcessTrimHeapFreeMemoryInterrupt();
}
/*
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 6a6634e1cd..c213c4163a 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -305,3 +305,57 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(true);
}
+
+/*
+ * pg_trim_backend_heap_free_memory
+ * Signal a backend or an auxiliary process to trim heap free memory.
+ *
+ * On receipt of this signal, a backend or an auxiliary process sets the flag
+ * in the signal handler, which causes the next CHECK_FOR_INTERRUPTS()
+ * or process-specific interrupt handler to log the memory contexts.
+ */
+Datum
+pg_trim_backend_heap_free_memory(PG_FUNCTION_ARGS)
+{
+ int pid = PG_GETARG_INT32(0);
+ PGPROC *proc;
+ ProcNumber procNumber = INVALID_PROC_NUMBER;
+
+ /*
+ * See if the process with given pid is a backend or an auxiliary process.
+ */
+ proc = BackendPidGetProc(pid);
+ if (proc == NULL)
+ proc = AuxiliaryPidGetProc(pid);
+
+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own. There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first and its memory contexts are not logged is not a problem.
+ */
+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+ if (SendProcSignal(pid, PROCSIG_TRIM_HEAP_FREE_MEMORY, procNumber) < 0)
+ {
+ /* Again, just a warning to allow loops */
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 03a54451ac..e90a61affe 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -39,6 +39,7 @@ volatile sig_atomic_t IdleSessionTimeoutPending = false;
volatile sig_atomic_t ProcSignalBarrierPending = false;
volatile sig_atomic_t LogMemoryContextPending = false;
volatile sig_atomic_t IdleStatsUpdateTimeoutPending = false;
+volatile sig_atomic_t TrimHeapFreeMemoryPending = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 QueryCancelHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index 01a1fb8527..395f119d77 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -21,6 +21,7 @@ OBJS = \
generation.o \
mcxt.o \
memdebug.o \
+ memtrim.o \
portalmem.o \
slab.o
diff --git a/src/backend/utils/mmgr/memtrim.c b/src/backend/utils/mmgr/memtrim.c
new file mode 100644
index 0000000000..5664eb9f80
--- /dev/null
+++ b/src/backend/utils/mmgr/memtrim.c
@@ -0,0 +1,69 @@
+/*-------------------------------------------------------------------------
+ *
+ * memtrim.c
+ * Declarations used in memory implementations.
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/utils/mmgr/memtrim.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <malloc.h>
+
+#include "postgres.h"
+
+#include "miscadmin.h"
+#include "utils/memutils.h"
+
+/*
+ * HandleTrimHeapFreeMemoryInterrupt
+ * Handle receipt of an interrupt indicating trimming of heap free
+ * memory.
+ *
+ * All the actual work is deferred to ProcessTrimHeapFreeMemoryInterrupt(),
+ * because we cannot safely precess trim inside the signal handler.
+ */
+void
+HandleTrimHeapFreeMemoryInterrupt(void)
+{
+ InterruptPending = true;
+ TrimHeapFreeMemoryPending = true;
+ /* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * ProcessTrimHeapFreeMemoryInterrupt
+ * Perform trimming of heap free memory of this backend process.
+ *
+ * Any backend that participates in ProcSignal signaling must arrange
+ * to call this function if we see TrimHeapFreeMemoryPending set.
+ * It is called from CHECK_FOR_INTERRUPTS(), which is enough because
+ * the target process for trimming of heap free memory is a backend.
+ */
+void
+ProcessTrimHeapFreeMemoryInterrupt(void)
+{
+ TrimHeapFreeMemoryPending = false;
+
+ /*
+ * Use LOG_SERVER_ONLY to prevent this message from being sent to the
+ * connected client.
+ */
+ ereport(LOG_SERVER_ONLY,
+ (errhidestmt(true),
+ errhidecontext(true),
+ errmsg("trimming heap free memory of PID %d", MyProcPid)));
+
+ /*
+ * The malloc_trim() function attempts to release free memory from
+ * the heap (by calling sbrk(2) or madvise(2) with suitable
+ * arguments).
+ * The argument is 0, only the minimum amount of memory is maintained
+ * at the top of the heap (i.e., one page or less).
+ */
+ malloc_trim(0);
+}
diff --git a/src/backend/utils/mmgr/meson.build b/src/backend/utils/mmgr/meson.build
index dd43a6844c..64633e0054 100644
--- a/src/backend/utils/mmgr/meson.build
+++ b/src/backend/utils/mmgr/meson.build
@@ -9,6 +9,7 @@ backend_sources += files(
'generation.c',
'mcxt.c',
'memdebug.c',
+ 'memtrim.c',
'portalmem.c',
'slab.c',
)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 53a081ed88..a8c3983c40 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8403,6 +8403,12 @@
prorettype => 'bool', proargtypes => 'int4',
prosrc => 'pg_log_backend_memory_contexts' },
+# logging memory contexts of the specified backend
+{ oid => '4555', descr => 'Trim Heap free memory of the specified backend',
+ proname => 'pg_trim_backend_heap_free_memory', provolatile => 'v',
+ prorettype => 'bool', proargtypes => 'int4',
+ prosrc => 'pg_trim_backend_heap_free_memory' },
+
# non-persistent series generator
{ oid => '1066', descr => 'non-persistent series generator',
proname => 'generate_series', prorows => '1000',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb..5a676eeb9e 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -100,6 +100,8 @@ extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
+extern PGDLLIMPORT volatile sig_atomic_t TrimHeapFreeMemoryPending;
+
/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;
extern PGDLLIMPORT volatile uint32 QueryCancelHoldoffCount;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index f94c11a9a8..0d4a3a42a2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,6 +48,8 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+ PROCSIG_TRIM_HEAP_FREE_MEMORY, /* ask backend to release free memory from the heap */
+
NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596ff21..61d4d6d252 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -104,6 +104,9 @@ extern void MemoryContextCheck(MemoryContext context);
extern void HandleLogMemoryContextInterrupt(void);
extern void ProcessLogMemoryContextInterrupt(void);
+extern void HandleTrimHeapFreeMemoryInterrupt(void);
+extern void ProcessTrimHeapFreeMemoryInterrupt(void);
+
/*
* Memory-context-type-specific functions
*/
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 35fb72f302..9bde10b2b1 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -365,6 +365,17 @@ RESET ROLE;
REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
FROM regress_log_memory;
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+ pg_trim_backend_heap_free_memory
+----------------------------------
+ t
+(1 row)
+
--
-- Test some built-in SRFs
--
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index e570783453..d1cd2440ee 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -143,6 +143,14 @@ REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
DROP ROLE regress_log_memory;
+--
+-- pg_trim_backend_heap_free_memory()
+--
+-- Trim the heap free memory.
+--
+
+SELECT pg_trim_backend_heap_free_memory(pg_backend_pid());
+
--
-- Test some built-in SRFs
--
--
2.39.5
shawn wang <shawn.wang.pg@gmail.com> writes:
I have successfully registered my patch for the commitfest. However, upon
integration, I encountered several errors during the testing phase. I am
currently investigating the root causes of these issues and will work on
providing the necessary fixes.
I should think the root cause is pretty obvious: malloc_trim() is
a glibc-ism.
I'm fairly doubtful that this is something we should spend time on.
It can never work on any non-glibc platform. Even granting that
a Linux-only feature could be worth having, I'm really doubtful
that our memory allocation patterns are such that malloc_trim()
could be expected to free a useful amount of memory mid-query.
The single test case you showed suggested that maybe we could
usefully prod glibc to free memory at query completion, but we
don't need all this interrupt infrastructure to do that. I think
we could likely get 95% of the benefit with about a five-line
patch.
regards, tom lane
I wrote:
The single test case you showed suggested that maybe we could
usefully prod glibc to free memory at query completion, but we
don't need all this interrupt infrastructure to do that. I think
we could likely get 95% of the benefit with about a five-line
patch.
To try to quantify that a little, I wrote a very quick-n-dirty
patch to apply malloc_trim during finish_xact_command and log
the effects. (I am not asserting this is the best place to
call malloc_trim; it's just one plausible possibility.) Patch
attached, as well as statistics collected from a run of the
core regression tests followed by
grep malloc_trim postmaster.log | sed 's/.*LOG:/LOG:/' | sort -k4n | uniq -c >trim_savings.txt
We can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO. It might be useful to look closer
at the behavior of those top four though. I see them as
2024-09-15 14:58:06.146 EDT [960138] LOG: malloc_trim saved 7228 kB
2024-09-15 14:58:06.146 EDT [960138] STATEMENT: ALTER TABLE delete_test_table ADD PRIMARY KEY (a,b,c,d);
2024-09-15 14:58:09.861 EDT [960949] LOG: malloc_trim saved 12488 kB
2024-09-15 14:58:09.861 EDT [960949] STATEMENT: with recursive search_graph(f, t, label, is_cycle, path) as (
select *, false, array[row(g.f, g.t)] from graph g
union distinct
select g.*, row(g.f, g.t) = any(path), path || row(g.f, g.t)
from graph g, search_graph sg
where g.f = sg.t and not is_cycle
)
select * from search_graph;
2024-09-15 14:58:09.866 EDT [960949] LOG: malloc_trim saved 12488 kB
2024-09-15 14:58:09.866 EDT [960949] STATEMENT: with recursive search_graph(f, t, label) as (
select * from graph g
union distinct
select g.*
from graph g, search_graph sg
where g.f = sg.t
) cycle f, t set is_cycle to 'Y' default 'N' using path
select * from search_graph;
2024-09-15 14:58:09.853 EDT [960949] LOG: malloc_trim saved 12616 kB
2024-09-15 14:58:09.853 EDT [960949] STATEMENT: with recursive search_graph(f, t, label) as (
select * from graph0 g
union distinct
select g.*
from graph0 g, search_graph sg
where g.f = sg.t
) search breadth first by f, t set seq
select * from search_graph order by seq;
I don't understand why WITH RECURSIVE queries might be more prone
to leave non-garbage-collected memory behind than other queries,
but maybe that is worth looking into.
regards, tom lane
Attachments:
quick-hack-for-malloc_trim.patchtext/x-diff; charset=us-ascii; name=quick-hack-for-malloc_trim.patchDownload
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..9efb4f7636 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -21,6 +21,7 @@
#include <fcntl.h>
#include <limits.h>
+#include <malloc.h>
#include <signal.h>
#include <unistd.h>
#include <sys/resource.h>
@@ -2797,6 +2798,16 @@ finish_xact_command(void)
MemoryContextStats(TopMemoryContext);
#endif
+ {
+ char *old_brk = sbrk(0);
+ char *new_brk;
+
+ malloc_trim(0);
+ new_brk = sbrk(0);
+ elog(LOG, "malloc_trim saved %zu kB",
+ (old_brk - new_brk + 1023) / 1024);
+ }
+
xact_started = false;
}
}
Thank you very much for your response and suggestions.
As you mentioned, the patch here is actually designed for glibc's ptmalloc2
andis not applicable to other platforms. I will consider supporting it only
on the Linux platform in the future. In the memory management strategy of
ptmalloc2, there is a certain amount of non-garbage-collected memory, which
is closely related to the order and method of memory allocation and
release. To reduce the performance overhead caused by frequent allocation
and release of small blocks of memory, ptmalloc2 intentionally retains this
part of the memory. The malloc_trim function locks, traverses memory
blocks, and uses madvise to release this part of the memory, but this
process may also have a negative impact on performance. In the process of
exploring solutions, I also considered a variety of strategies, including
scheduling malloc_trim to be executed at regular intervals or triggering
malloc_trim after a specific number of free operations. However, we found
that these methods are not optimal solutions.
We can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO. It might be useful to look closer
at the behavior of those top four though. I see them as
I have previously encountered situations where the non-garbage-collected
memory of wal_sender was approximately hundreds of megabytes or even
exceeded 1GB, but I was unable to reproduce this situation using simple
SQL. Therefore, I introduced an asynchronous processing function, hoping to
manage memory more efficiently without affecting performance.
In addition, I have considered the following optimization strategies:
1.
Adjust the configuration of ptmalloc2 through the mallopt function to
use mmap rather than sbrk for memory allocation. This can immediately
return the memory to the operating system when it is released, but it may
affect performance due to the higher overhead of mmap.
2.
Use other memory allocators such as jemalloc or tcmalloc, and adjust
relevant parameters to reduce the generation of non-garbage-collected
memory. However, these allocators are designed for multi-threaded and may
lead to increased memory usage per process.
3.
Build a set of memory context (memory context) allocation functions
based on mmap, delegating the responsibility of memory management entirely
to the database level. Although this solution can effectively control
memory allocation, it requires a large-scale engineering implementation.
I look forward to further discussing these solutions with you and exploring
the best memory management practices together.
Best regards, Shawn
Tom Lane <tgl@sss.pgh.pa.us> 于2024年9月16日周一 03:16写道:
Show quoted text
I wrote:
The single test case you showed suggested that maybe we could
usefully prod glibc to free memory at query completion, but we
don't need all this interrupt infrastructure to do that. I think
we could likely get 95% of the benefit with about a five-line
patch.To try to quantify that a little, I wrote a very quick-n-dirty
patch to apply malloc_trim during finish_xact_command and log
the effects. (I am not asserting this is the best place to
call malloc_trim; it's just one plausible possibility.) Patch
attached, as well as statistics collected from a run of the
core regression tests followed bygrep malloc_trim postmaster.log | sed 's/.*LOG:/LOG:/' | sort -k4n | uniq
-c >trim_savings.txtWe can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO. It might be useful to look closer
at the behavior of those top four though. I see them as2024-09-15 14:58:06.146 EDT [960138] LOG: malloc_trim saved 7228 kB
2024-09-15 14:58:06.146 EDT [960138] STATEMENT: ALTER TABLE
delete_test_table ADD PRIMARY KEY (a,b,c,d);2024-09-15 14:58:09.861 EDT [960949] LOG: malloc_trim saved 12488 kB
2024-09-15 14:58:09.861 EDT [960949] STATEMENT: with recursive
search_graph(f, t, label, is_cycle, path) as (
select *, false, array[row(g.f, g.t)] from graph g
union distinct
select g.*, row(g.f, g.t) = any(path), path || row(g.f,
g.t)
from graph g, search_graph sg
where g.f = sg.t and not is_cycle
)
select * from search_graph;2024-09-15 14:58:09.866 EDT [960949] LOG: malloc_trim saved 12488 kB
2024-09-15 14:58:09.866 EDT [960949] STATEMENT: with recursive
search_graph(f, t, label) as (
select * from graph g
union distinct
select g.*
from graph g, search_graph sg
where g.f = sg.t
) cycle f, t set is_cycle to 'Y' default 'N' using path
select * from search_graph;2024-09-15 14:58:09.853 EDT [960949] LOG: malloc_trim saved 12616 kB
2024-09-15 14:58:09.853 EDT [960949] STATEMENT: with recursive
search_graph(f, t, label) as (
select * from graph0 g
union distinct
select g.*
from graph0 g, search_graph sg
where g.f = sg.t
) search breadth first by f, t set seq
select * from search_graph order by seq;I don't understand why WITH RECURSIVE queries might be more prone
to leave non-garbage-collected memory behind than other queries,
but maybe that is worth looking into.regards, tom lane
On 9/18/24 04:56, shawn wang wrote:
Thank you very much for your response and suggestions.
As you mentioned, the patch here is actually designed for glibc's
ptmalloc2 andis not applicable to other platforms. I will consider
supporting it only on the Linux platform in the future. In the memory
management strategy of ptmalloc2, there is a certain amount of non-
garbage-collected memory, which is closely related to the order and
method of memory allocation and release. To reduce the performance
overhead caused by frequent allocation and release of small blocks of
memory, ptmalloc2 intentionally retains this part of the memory. The
malloc_trim function locks, traverses memory blocks, and uses madvise to
release this part of the memory, but this process may also have a
negative impact on performance. In the process of exploring solutions, I
also considered a variety of strategies, including scheduling
malloc_trim to be executed at regular intervals or triggering
malloc_trim after a specific number of free operations. However, we
found that these methods are not optimal solutions.We can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO. It might be useful to look closer
at the behavior of those top four though. I see them asI have previously encountered situations where the non-garbage-collected
memory of wal_sender was approximately hundreds of megabytes or even
exceeded 1GB, but I was unable to reproduce this situation using simple
SQL. Therefore, I introduced an asynchronous processing function, hoping
to manage memory more efficiently without affecting performance.
I doubt a system function is the right approach to deal with these
memory allocation issues. The function has to be called by the user,
which means the user is expected to monitor the system and decide when
to invoke the function. That seems far from trivial - it would require
collecting OS-level information about memory usage, and I suppose it'd
need to happen fairly often to actually help with OOM reliably.
In addition, I have considered the following optimization strategies:
1.
Adjust the configuration of ptmalloc2 through the mallopt function
to use mmap rather than sbrk for memory allocation. This can
immediately return the memory to the operating system when it is
released, but it may affect performance due to the higher overhead
of mmap.
Sure, forcing the system to release memory more aggressively may affect
performance - that's the tradeoff done by glibc. But calling the new
pg_trim_backend_heap_free_memory() function is not free either.
But why would it force returning the memory to be returned immediately?
The decision whether to trim memory is driven by M_TRIM_THRESHOLD, and
that does not need to be 0. In fact, it's 128kB by default, i.e. glibc
trims memory automatically, if it can trim at least 128kB.
Yes, by default the thresholds are adjusted dynamically, which I guess
is one way to get excessive memory usage that could have been solved by
calling malloc_trim(). But setting the option to any value disabled the
dynamic behavior, it doesn't need to be set to 0.
2.
Use other memory allocators such as jemalloc or tcmalloc, and adjust
relevant parameters to reduce the generation of non-garbage-
collected memory. However, these allocators are designed for multi-
threaded and may lead to increased memory usage per process.
Right, that's kinda the opposite of trying to not waste memory.
But it also suggests syscalls (done by malloc) may be a problem under
high concurrency - not just with multi-threading, but even with regular
processes. And for glibc that matters too, of course - in fact, it may
be pretty important to allow glibc to cache more memory (by increasing
M_TOP_PAD) to get good throughput in certain workloads ...
3.
Build a set of memory context (memory context) allocation functions
based on mmap, delegating the responsibility of memory management
entirely to the database level. Although this solution can
effectively control memory allocation, it requires a large-scale
engineering implementation.
Why would it be complex? You could just as well set M_MMAP_THRESHOLD to
some low value, so that all malloc() calls are handled by mmap()
internally. Not sure it's a good idea, though.
I look forward to further discussing these solutions with you and
exploring the best memory management practices together.
Adjusting the glibc malloc() behavior may be important, but I don't
think a system function is a good approach. It's possible to change the
behavior by setting environment variables, which is pretty easy, but
maybe we could have some thing that does the same thing using mallopt().
That's what Ronan Dunklau proposed in thread [1]/messages/by-id/3424675.QJadu78ljV@aivenlaptop a year ago ... I like
that approach much more, it's much simpler for the user.
[1]: /messages/by-id/3424675.QJadu78ljV@aivenlaptop
/messages/by-id/3424675.QJadu78ljV@aivenlaptop
regards
--
Tomas Vondra
On 12/8/24 05:23, Tomas Vondra wrote:
On 9/18/24 04:56, shawn wang wrote:
Thank you very much for your response and suggestions.
As you mentioned, the patch here is actually designed for glibc's
ptmalloc2 andis not applicable to other platforms. I will consider
supporting it only on the Linux platform in the future. In the memory
management strategy of ptmalloc2, there is a certain amount of non-
garbage-collected memory, which is closely related to the order and
method of memory allocation and release. To reduce the performance
overhead caused by frequent allocation and release of small blocks of
memory, ptmalloc2 intentionally retains this part of the memory. The
malloc_trim function locks, traverses memory blocks, and uses madvise to
release this part of the memory, but this process may also have a
negative impact on performance. In the process of exploring solutions, I
also considered a variety of strategies, including scheduling
malloc_trim to be executed at regular intervals or triggering
malloc_trim after a specific number of free operations. However, we
found that these methods are not optimal solutions.We can see that out of about 43K test queries, 32K saved nothing
whatever, and in only four was more than a couple of meg saved.
That's pretty discouraging IMO. It might be useful to look closer
at the behavior of those top four though. I see them asI have previously encountered situations where the non-garbage-collected
memory of wal_sender was approximately hundreds of megabytes or even
exceeded 1GB, but I was unable to reproduce this situation using simple
SQL. Therefore, I introduced an asynchronous processing function, hoping
to manage memory more efficiently without affecting performance.
I doubt a system function is the right approach to deal with these
memory allocation issues. The function has to be called by the user,
which means the user is expected to monitor the system and decide when
to invoke the function. That seems far from trivial - it would require
collecting OS-level information about memory usage, and I suppose it'd
need to happen fairly often to actually help with OOM reliably.In addition, I have considered the following optimization strategies:
1.
Adjust the configuration of ptmalloc2 through the mallopt function
to use mmap rather than sbrk for memory allocation. This can
immediately return the memory to the operating system when it is
released, but it may affect performance due to the higher overhead
of mmap.Sure, forcing the system to release memory more aggressively may affect
performance - that's the tradeoff done by glibc. But calling the new
pg_trim_backend_heap_free_memory() function is not free either.But why would it force returning the memory to be returned immediately?
The decision whether to trim memory is driven by M_TRIM_THRESHOLD, and
that does not need to be 0. In fact, it's 128kB by default, i.e. glibc
trims memory automatically, if it can trim at least 128kB.Yes, by default the thresholds are adjusted dynamically, which I guess
is one way to get excessive memory usage that could have been solved by
calling malloc_trim(). But setting the option to any value disabled the
dynamic behavior, it doesn't need to be set to 0.2.
Use other memory allocators such as jemalloc or tcmalloc, and adjust
relevant parameters to reduce the generation of non-garbage-
collected memory. However, these allocators are designed for multi-
threaded and may lead to increased memory usage per process.Right, that's kinda the opposite of trying to not waste memory.
But it also suggests syscalls (done by malloc) may be a problem under
high concurrency - not just with multi-threading, but even with regular
processes. And for glibc that matters too, of course - in fact, it may
be pretty important to allow glibc to cache more memory (by increasing
M_TOP_PAD) to get good throughput in certain workloads ...3.
Build a set of memory context (memory context) allocation functions
based on mmap, delegating the responsibility of memory management
entirely to the database level. Although this solution can
effectively control memory allocation, it requires a large-scale
engineering implementation.Why would it be complex? You could just as well set M_MMAP_THRESHOLD to
some low value, so that all malloc() calls are handled by mmap()
internally. Not sure it's a good idea, though.I look forward to further discussing these solutions with you and
exploring the best memory management practices together.Adjusting the glibc malloc() behavior may be important, but I don't
think a system function is a good approach. It's possible to change the
behavior by setting environment variables, which is pretty easy, but
maybe we could have some thing that does the same thing using mallopt().That's what Ronan Dunklau proposed in thread [1] a year ago ... I like
that approach much more, it's much simpler for the user.
To propose something less abstract / more tangible, I think we should do
something like this:
1) add a bit of code for glibc-based systems, that adjusts selected
malloc parameters using mallopt() during startup
2) add a GUC that enables this, with the default being the regular glibc
behavior (with dynamic adjustment of various thresholds)
Which exact parameters would this set is an open question, but based on
my earlier experiments, Ronan's earlier patches, etc. I think it should
adjust at least
M_TRIM_THRESHOLD - to make sure we trim heap regularly
M_TOP_PAD - to make sure we cache some allocated memory
I wonder if maybe we should tune M_MMAP_THRESHOLD, which on 64-bit
systems defaults to 32MB, so we don't really mmap() very often for
regular memory contexts. But I don't know if that's a good idea, that
would need some experiments.
I believe that's essentially what Ronan Dunklau proposed, but it
stalled. Not because of some inherent complexity, but because of
concerns about introducing glibc-specific code.
Based on my recent experiments I think it's clearly worth it (esp. with
high concurrency workloads). If glibc was a niche, it'd be a different
situation, but I'd guess vast majority of databases runs on glibc. Yes,
it's possible to do these changes without new code (e.g. by setting the
environment variables), but that's rather inconvenient.
Perhaps it'd be possible to make it a bit smarter by looking at malloc
stats, and adjust the trim/pad thresholds, but I'd leave that for the
future. It might even lead to similar issues with excessive memory usage
just like the logic built into glibc.
But maybe we could at least print / provide some debugging information?
That would help with adjusting the GUC ...
regards
--
Tomas Vondra
On Sun, Dec 8, 2024 at 7:48 PM Tomas Vondra <tomas@vondra.me> wrote:
[..]
I have previously encountered situations where the non-garbage-collected
memory of wal_sender was approximately hundreds of megabytes or even
exceeded 1GB, but I was unable to reproduce this situation using simple
SQL. Therefore, I introduced an asynchronous processing function, hoping
to manage memory more efficiently without affecting performance.I doubt a system function is the right approach to deal with these
memory allocation issues. The function has to be called by the user,
which means the user is expected to monitor the system and decide when
to invoke the function. That seems far from trivial - it would require
collecting OS-level information about memory usage, and I suppose it'd
need to happen fairly often to actually help with OOM reliably.
[..]
Sure, forcing the system to release memory more aggressively may affect
performance - that's the tradeoff done by glibc. But calling the new
pg_trim_backend_heap_free_memory() function is not free either.But why would it force returning the memory to be returned immediately?
The decision whether to trim memory is driven by M_TRIM_THRESHOLD, and
that does not need to be 0. In fact, it's 128kB by default, i.e. glibc
trims memory automatically, if it can trim at least 128kB.
[..]
To propose something less abstract / more tangible, I think we should do
something like this:1) add a bit of code for glibc-based systems, that adjusts selected
malloc parameters using mallopt() during startup2) add a GUC that enables this, with the default being the regular glibc
behavior (with dynamic adjustment of various thresholds)Which exact parameters would this set is an open question, but based on
my earlier experiments, Ronan's earlier patches, etc. I think it should
adjust at leastM_TRIM_THRESHOLD - to make sure we trim heap regularly
M_TOP_PAD - to make sure we cache some allocated memoryI wonder if maybe we should tune M_MMAP_THRESHOLD, which on 64-bit
systems defaults to 32MB, so we don't really mmap() very often for
regular memory contexts. But I don't know if that's a good idea, that
would need some experiments.I believe that's essentially what Ronan Dunklau proposed, but it
stalled. Not because of some inherent complexity, but because of
concerns about introducing glibc-specific code.Based on my recent experiments I think it's clearly worth it (esp. with
high concurrency workloads). If glibc was a niche, it'd be a different
situation, but I'd guess vast majority of databases runs on glibc. Yes,
it's possible to do these changes without new code (e.g. by setting the
environment variables), but that's rather inconvenient.Perhaps it'd be possible to make it a bit smarter by looking at malloc
stats, and adjust the trim/pad thresholds, but I'd leave that for the
future. It might even lead to similar issues with excessive memory usage
just like the logic built into glibc.But maybe we could at least print / provide some debugging information?
That would help with adjusting the GUC ...
Hi all,
Thread bump. Just to add one single data point to this discussion, we
have been chasing some ghost memory leaks that apparently were not
memory leaks after all (they stop at certain threshold like 1.2GB)
but there were still OOMs present, and after some experimentation it
seemed that memory ended up being used in MemoryContexts, but
afterwards it was released (so outside of TopMemoryContext) when
session went idle/idle in transaction, but the processes was *still*
having it allocated. Injecting a call to `malloc_trim()` released
backend memory for sessions that were idle for some time.
E.g. with PG 13.x I've got more or less sample reproducer (thanks to
my colleague Matthew Gwillam-Kelly who was working on initial
identification of the problem):
DROP TABLE p;
CREATE TABLE p (
id int not null,
sensor_id bigint not null,
val bigint
) PARTITION BY HASH (sensor_id);
CREATE INDEX p_idx ON P (val);
SELECT 'CREATE TABLE p_'||g||' PARTITION OF p FOR VALUES WITH (MODULUS
1000, REMAINDER ' || g || ');' FROM generate_series(0, 999) g;
\gexec
INSERT INTO p SELECT g, g, g FROM generate_series(1, 1000000) g;
ANALYZE p;
Run `UPDATE p SET val = val;` minium 3 or 4 times in new session, the
backend will use in my case like ~400MB and stay (!) like that for
infinite time:
$ grep ^Pss /proc/27421/smaps_rollup
Pss: 399291 kB
Pss_Dirty: 397351 kB
Pss_Anon: 353859 kB
Pss_File: 1939 kB
Pss_Shmem: 43492 kB
After injecting call to malloc_trim(0) it shows much lower Pss_Anon:
$ grep ^Pss /proc/27421/smaps_rollup
Pss: 65904 kB
Pss_Dirty: 64189 kB
Pss_Anon: 23231 kB
Pss_File: 1715 kB
Pss_Shmem: 40957 kB
NOTE: it is not depending on (maintenance_)work_mem variables, more
like PG version involved, extensions, encoding probably, partitions
count, triggers maybe. That's like ~353MB wasted above (but our
customer was hitting it in ~1.2 GB range but they were having
additional extensions loaded which could further amplify the effect)
with fully allocated memory without usage in memory contexts (pfree()
were successful, free() done nothing, it's just it's not returned back
to the OS), so before the trim it is like that this:
TopMemoryContext: 801664 total in 29 blocks; 498048 free (2033
chunks); 303616 used
[..]
Grand total: 22213784 bytes in 3129 blocks; 9674384 free (3393
chunks); 12539400 used
Such single UPDATE causes the following malloc frequency histogram of
sizes in malloc():
@:
[1]: 1 | | [2, 4) 43 | | [4, 8) 81 | | [8, 16) 261 |@ | [16, 32) 10049 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32, 64) 8951 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64, 128) 446 |@@ | [128, 256) 133 | | [256, 512) 118 | | [512, 1K) 11 | | [1K, 2K) 134 | | [2K, 4K) 5 | | [4K, 8K) 94 | | [8K, 16K) 1020 |@@@@@ | [16K, 32K) 4122 |@@@@@@@@@@@@@@@@@@@@@ | [32K, 64K) 29 | | [64K, 128K) 14 | | [128K, 256K) 2196 |@@@@@@@@@@@ | [256K, 512K) 2 | | [..]
[2, 4) 43 | |
[4, 8) 81 | |
[8, 16) 261 |@ |
[16, 32) 10049 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64) 8951 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 446 |@@ |
[128, 256) 133 | |
[256, 512) 118 | |
[512, 1K) 11 | |
[1K, 2K) 134 | |
[2K, 4K) 5 | |
[4K, 8K) 94 | |
[8K, 16K) 1020 |@@@@@ |
[16K, 32K) 4122 |@@@@@@@@@@@@@@@@@@@@@ |
[32K, 64K) 29 | |
[64K, 128K) 14 | |
[128K, 256K) 2196 |@@@@@@@@@@@ |
[256K, 512K) 2 | |
[..]
E.g one of the hot paths for this there is (remember it's still PG13)
heap_update->RelationGetBufferForTuple->GetPageWithFreeSpace->fsm_search->fsm_readbuf->mdopenfork->mdopenfork->PathNameOpenFile->PathNameOpenFilePerm->__GI___strdup
. Here's it's strdup() but it could be anything and that's the point.
This effect in libc is completley reproducible, please see attached,
any use of allocating small (<= 120 bytes) ends up not releasing
memory for the program.
$ gcc mwr.c -o mwr -DMALLOC_SIZE=120 && ./mwr
done
Rss: 1251460 kB
Pss: 1250136 kB
Pss_Dirty: 1250112 kB
Pss_Anon: 1250100 kB
Pss_File: 36 kB
Pss_Shmem: 0 kB
after malloc_trim:
Rss: 1460 kB
Pss: 136 kB
Pss_Dirty: 100 kB
Pss_Anon: 100 kB
Pss_File: 36 kB
Pss_Shmem: 0 kB
$ gcc mwr.c -o mwr -DMALLOC_SIZE=121 && ./mwr # 120+8 >= 128
done
Rss: 1676 kB
Pss: 259 kB
Pss_Dirty: 224 kB
Pss_Anon: 224 kB
Pss_File: 35 kB
Pss_Shmem: 0 kB
after malloc_trim:
Rss: 1548 kB
Pss: 131 kB
Pss_Dirty: 96 kB
Pss_Anon: 96 kB
Pss_File: 35 kB
Pss_Shmem: 0 kB
Now, the current PG18 behaved much better in that regard without that
many small mallocs during runtime (strdup() is still there, it's just
that hotpath not exercised that often):
@:
[8, 16) 2697 |@@@@@@@ |
[16, 32) 2203 |@@@@@@ |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 3014 |@@@@@@@@ |
[2K, 4K) 5 | |
[4K, 8K) 2 | |
[8K, 16K) 1107 |@@@ |
[16K, 32K) 18112 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32K, 64K) 12 | |
[..]
Yet I still could drop Pss_Anon by using malloc_trim(0) from ~44MB to
~13MB. Assume we have 1k idle connections like this and you end up
wasting 30GB RAM theoretically.
So we basically we have two generic solutions to this class of
problems to avoid OOMs due to GNU libc's malloc() not releasing
memory:
0. Disconnecting the backend (I'm not counting it as it doesn't seem
to be a solid long term solution, but it explains why people push for
poolers with refreshable connection pools).
1. Call malloc_trim(0), but Tom stated it might be not portable. So
maybe there is a chance for extension or #ifdefs . I do think that
calling it after every query might be not ideal due to overheads, but
perhaps after query is done we could schedule interrupt aimed at
now()+X seconds (where X>= 5?), so execute it only when the backend
went really inactive (to avoid re-allocating the memory again), but
abort launching this it if we have started next query. I haven't
looked at the code so i don't know if that can be done cheaply.
2. Or use GLIBC_TUNABLES e.g. disable mxfast bin allocations shows
some promise even still with many small allocations
$ gcc mwr.c -o mwr -DMALLOC_SIZE=120 &&
GLIBC_TUNABLES=glibc.malloc.mxfast=0 ./mwr
done
Rss: 1680 kB
Pss: 257 kB
Pss_Dirty: 236 kB # no need for malloc_trim()
Pss_Anon: 224 kB
Pss_File: 33 kB
Pss_Shmem: 0 kB
[..]
From my side also -1 to the idea of pg_trim_backend_heap_free_memory()
exposed function as per original patch proposal, as how is the user
supposed to embed this within his application?
I have not quantified the overhead for #1 and #2.
-J.
Attachments:
Thank you very much for your response.
To propose something less abstract / more tangible, I think we should do
something like this:1) add a bit of code for glibc-based systems, that adjusts selected
malloc parameters using mallopt() during startup2) add a GUC that enables this, with the default being the regular glibc
behavior (with dynamic adjustment of various thresholds)
I believe that the issue here arises from design incompatibilities between
the complex engineering code and ptmalloc.
Modifying malloc parameters through mallopt is not user-friendly for
database users and can be overly complex.
Moreover, setting certain parameters may lead to performance issues.
Monitoring memory usage should be a common practice for all database users.
With my signal-based approach, we can trigger a trim operation
when high memory usage is detected or by setting up a scheduled task.
This reduces the complexity for users and also helps in lowering memory
consumption.
Of course, this solution is not perfect and does not address the problem
elegantly from a fundamental perspective.
However, it has proven effective in the user environment.
I have set up a scheduled task to execute a function every 10 minutes for
processes exceeding 50MB.
This has reduced memory usage from 87% to 30% on a 64GB system.
Best regards
Shawn