[PATCH] Fix memory leak in memoize for numeric key

Started by Orlov Aleksejover 2 years ago4 messages

al.orlov@cft.ru

over 2 years ago

1 attachment(s)

Hello, all!

I found a query which consumes a lot of memory and triggers OOM killer.
Memory leak occurs in memoize node for numeric key.
Version postgresql is 14.9. The problem is very
similar /messages/by-id/17844-d2f6f9e75a622bed@postgresql.org

I attached to the backend with a debugger and set a breakpoint in AllocSetAlloc

(gdb) bt 10
#0 AllocSetAlloc (context=0x5c55086dc2f0, size=12) at aset.c:722
#1 0x00005c5507d886e0 in palloc (size=size@entry=12) at mcxt.c:1082
#2 0x00005c5507890bba in detoast_attr (attr=0x715d5daa04c9) at detoast.c:184
#3 0x00005c5507d62375 in pg_detoast_datum (datum=<optimized out>) at fmgr.c:1725
#4 0x00005c5507cc94ea in hash_numeric (fcinfo=<optimized out>) at numeric.c:2554
#5 0x00005c5507d61570 in FunctionCall1Coll (flinfo=flinfo@entry=0x5c5508b93d00,
collation=<optimized out>, arg1=<optimized out>) at fmgr.c:1138
#6 0x00005c5507aadc16 in MemoizeHash_hash (key=0x0, tb=<optimized out>) at nodeMemoize.c:199
#7 0x00005c5507aadf22 in memoize_insert (key=0x0, found=<synthetic pointer>,
tb=0x5c5508bb4760) at ../../../src/include/lib/simplehash.h:762
#8 cache_lookup (found=<synthetic pointer>, mstate=0x5c5508b91418) at nodeMemoize.c:519
#9 ExecMemoize (pstate=0x5c5508b91418) at nodeMemoize.c:705

I was able to create reproducible test case on machine with default config
and postgresql 14.9:

CREATE TABLE table1 (
id numeric(38) NOT NULL,
col1 text,
CONSTRAINT id2 PRIMARY KEY (id)
);
CREATE TABLE table2 (
id numeric(38) NOT NULL,
id_table1 numeric(38) NULL,
CONSTRAINT id1 PRIMARY KEY (id)
);
ALTER TABLE table2 ADD CONSTRAINT constr1 FOREIGN KEY (id_table1) REFERENCES table1(id);

INSERT INTO table1 (id, col1)
SELECT id::numeric, id::text
FROM generate_series(3000000000, 3000000000 + 600000) gs(id);

INSERT INTO table2 (id, id_table1)
SELECT id::numeric , (select floor(random() * 600000)::numeric + 3000000000)::numeric
FROM generate_series(1,600000) gs(id);

set max_parallel_workers_per_gather=0;
set enable_hashjoin = off;

EXPLAIN analyze
select sum(q.id_table1)
from (
SELECT t2.*
FROM table1 t1
JOIN table2 t2
ON t2.id_table1 = t1.id) q;

Plan:
Aggregate (cost=25744.90..25744.91 rows=1 width=32) (actual time=380.140..380.142 rows=1 loops=1)
-> Nested Loop (cost=0.43..24244.90 rows=600000 width=9) (actual time=0.063..310.915 rows=600000 loops=1)
-> Seq Scan on table2 t2 (cost=0.00..9244.00 rows=600000 width=9) (actual time=0.009..38.629 rows=600000 loops=1)
-> Memoize (cost=0.43..0.47 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=600000)
Cache Key: t2.id_table1
Cache Mode: logical
Hits: 599999 Misses: 1 Evictions: 0 Overflows: 0 Memory Usage: 1kB
-> Index Only Scan using id2 on table1 t1 (cost=0.42..0.46 rows=1 width=8) (actual time=0.039..0.040 rows=1 loops=1)
Index Cond: (id = t2.id_table1)
Heap Fetches: 0
Planning Time: 0.445 ms
Execution Time: 380.750 ms

I've attached memoize_memory_leak_numeric_key.patch to address this.

Using test case, here are the memory stats before and after the
fix (taken during ExecEndMemoize by using MemoryContextStatsDetail(TopMemoryContext, 100, 1)).
Before:
ExecutorState: 25209672 total in 15 blocks; 1134432 free (7 chunks); 24075240 used
MemoizeHashTable: 8192 total in 1 blocks; 7480 free (1 chunks); 712 used

After:
ExecutorState: 76616 total in 5 blocks; 1776 free (8 chunks); 74840 used
MemoizeHashTable: 8192 total in 1 blocks; 7480 free (1 chunks); 712 used

Thanks,
Alexei Orlov
al.orlov@cft.ru,
aporlov@gmail.com

Attachments:

memoize_memory_leak_numeric_key.patchapplication/octet-stream; name=memoize_memory_leak_numeric_key.patchDownload

diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 8f79a23284..93ca5057e1 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -161,6 +161,10 @@ MemoizeHash_hash(struct memoize_hash *tb, const MemoizeKey *key)
 	TupleTableSlot *pslot = mstate->probeslot;
 	uint32		hashkey = 0;
 	int			numkeys = mstate->nkeys;
+	ExprContext *econtext = mstate->ss.ps.ps_ExprContext;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 
 	if (mstate->binary_mode)
 	{
@@ -203,6 +207,7 @@ MemoizeHash_hash(struct memoize_hash *tb, const MemoizeKey *key)
 		}
 	}
 
+	MemoryContextSwitchTo(oldcontext);
 	return murmurhash32(hashkey);
 }

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: Orlov Aleksej (#1)

1 attachment(s)

Re: [PATCH] Fix memory leak in memoize for numeric key

On Tue, 3 Oct 2023 at 19:38, Orlov Aleksej <al.orlov@cft.ru> wrote:

I found a query which consumes a lot of memory and triggers OOM killer.
Memory leak occurs in memoize node for numeric key.

Thanks for the analysis and the patch.

I've attached memoize_memory_leak_numeric_key.patch to address this.

Yeah, this is a bug for sure.

Looking at ExecHashGetHashValue() for example purposes, I see it's
quite careful to call ResetExprContext(econtext) at the top of the
function to reset the tuple context.

I think the patch might need to go a bit further and also adjust
MemoizeHash_equal(). In non-binary mode, we just call
ExecQualAndReset() which evaluates the join condition and resets the
context. The binary mode code does not do this, so I think we should
expand on what you've done and adjust that code too.

I've done that in the attached patch.

David

Attachments:

fix_memoize_memory_leak_v2.patchapplication/octet-stream; name=fix_memoize_memory_leak_v2.patchDownload

diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 94bf479287..1085b3c79d 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -158,10 +158,14 @@ static uint32
 MemoizeHash_hash(struct memoize_hash *tb, const MemoizeKey *key)
 {
 	MemoizeState *mstate = (MemoizeState *) tb->private_data;
+	ExprContext *econtext = mstate->ss.ps.ps_ExprContext;
+	MemoryContext oldcontext;
 	TupleTableSlot *pslot = mstate->probeslot;
 	uint32		hashkey = 0;
 	int			numkeys = mstate->nkeys;
 
+	oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
 	if (mstate->binary_mode)
 	{
 		for (int i = 0; i < numkeys; i++)
@@ -203,6 +207,8 @@ MemoizeHash_hash(struct memoize_hash *tb, const MemoizeKey *key)
 		}
 	}
 
+	ResetExprContext(econtext);
+	MemoryContextSwitchTo(oldcontext);
 	return murmurhash32(hashkey);
 }
 
@@ -226,7 +232,11 @@ MemoizeHash_equal(struct memoize_hash *tb, const MemoizeKey *key1,
 
 	if (mstate->binary_mode)
 	{
+		MemoryContext oldcontext;
 		int			numkeys = mstate->nkeys;
+		bool		match = true;
+
+		oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 
 		slot_getallattrs(tslot);
 		slot_getallattrs(pslot);
@@ -236,7 +246,10 @@ MemoizeHash_equal(struct memoize_hash *tb, const MemoizeKey *key1,
 			FormData_pg_attribute *attr;
 
 			if (tslot->tts_isnull[i] != pslot->tts_isnull[i])
-				return false;
+			{
+				match = false;
+				break;
+			}
 
 			/* both NULL? they're equal */
 			if (tslot->tts_isnull[i])
@@ -246,9 +259,15 @@ MemoizeHash_equal(struct memoize_hash *tb, const MemoizeKey *key1,
 			attr = &tslot->tts_tupleDescriptor->attrs[i];
 			if (!datum_image_eq(tslot->tts_values[i], pslot->tts_values[i],
 								attr->attbyval, attr->attlen))
-				return false;
+			{
+				match = false;
+				break;
+			}
 		}
-		return true;
+
+		ResetExprContext(econtext);
+		MemoryContextSwitchTo(oldcontext);
+		return match;
 	}
 	else
 	{

Orlov Aleksej

al.orlov@cft.ru

over 2 years ago

In reply to: David Rowley (#2)

RE: [PATCH] Fix memory leak in memoize for numeric key

I've finished testing the patch.
I confirm that the patch solves the problem and works just as fast.

Thanks,
Alexey Orlov.

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: Orlov Aleksej (#3)

Re: [PATCH] Fix memory leak in memoize for numeric key

On Wed, 4 Oct 2023 at 21:08, Orlov Aleksej <al.orlov@cft.ru> wrote:

I've finished testing the patch.
I confirm that the patch solves the problem and works just as fast.

Thanks for checking that.

I've pushed the patch now.

David