Memory leak in FDW

Started by Heikki Linnakangasover 14 years ago6 messages

heikki.linnakangas@enterprisedb.com

over 14 years ago

1 attachment(s)

Foreign data wrapper's IterateForeignScan() function is supposed to be
called in a short-lived memory context, but the memory context is
actually not reset during query execution. That's a pretty bad memory
leak. I've been testing this with file_fdw and a large file, and "SELECT
COUNT(*) FROM foreign_table"

Interestingly, if you add any WHERE clause to it, the memory context is
reset in ExecScan and the leak goes away. This is only a problem with
the fastpath in ExecScan for the case of no quals and no projections.

The trivial fix is to reset the per-tuple memory context between
iterations. I tried to look around for other executor nodes that might
have the same problem. I didn't see any obvious leaks, although index
scan node seems to call AM's getnext without resetting the memory
context in between. That's a pretty well-tested codepath, however, and
there hasn't been any complains of leaks with index scans, so there must
be something that mitigates it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

plug-foreign-scan-memory-leak-1.patchtext/x-diff; name=plug-foreign-scan-memory-leak-1.patchDownload

diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index d50489c..2e36da0 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -44,7 +44,11 @@ ForeignNext(ForeignScanState *node)
 	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	MemoryContext oldcontext;
 
-	/* Call the Iterate function in short-lived context */
+	/*
+	 * Call the Iterate function in short-lived context. Reset it first
+	 * to free any leftovers from previous iteration.
+	 */
+	ResetExprContext(econtext);
 	oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 	slot = node->fdwroutine->IterateForeignScan(node);
 	MemoryContextSwitchTo(oldcontext);

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Heikki Linnakangas (#1)

Re: Memory leak in FDW

Excerpts from Heikki Linnakangas's message of mar abr 26 15:06:51 -0300 2011:

I tried to look around for other executor nodes that might
have the same problem. I didn't see any obvious leaks, although index
scan node seems to call AM's getnext without resetting the memory
context in between. That's a pretty well-tested codepath, however, and
there hasn't been any complains of leaks with index scans, so there must
be something that mitigates it.

Don't we have some rule that functions used in index AMs are supposed to
be leak-free?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Heikki Linnakangas (#1)

Re: Memory leak in FDW

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

The trivial fix is to reset the per-tuple memory context between
iterations.

Have you tested this with SRFs?

ForeignNext seems like quite the wrong place for resetting
exprcontext in any case ...

regards, tom lane

Greg Stark

gsstark@mit.edu

over 14 years ago

In reply to: Alvaro Herrera (#2)

Re: Memory leak in FDW

On Tue, Apr 26, 2011 at 7:15 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Heikki Linnakangas's message of mar abr 26 15:06:51 -0300 2011:

I tried to look around for other executor nodes that might
have the same problem. I didn't see any obvious leaks, although index
scan node seems to call AM's getnext without resetting the memory
context in between. That's a pretty well-tested codepath, however, and
there hasn't been any complains of leaks with index scans, so there must
be something that mitigates it.

Don't we have some rule that functions used in index AMs are supposed to
be leak-free?

btree operators and opclass functions are supposed to be leak-free. I
think other AMs don't try to have the same strictness.

--
greg

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 14 years ago

In reply to: Tom Lane (#3)

2 attachment(s)

Re: Memory leak in FDW

On 26.04.2011 21:30, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

The trivial fix is to reset the per-tuple memory context between
iterations.

Have you tested this with SRFs?

ForeignNext seems like quite the wrong place for resetting
exprcontext in any case ...

ExecScan would be more appropriate I guess (attached).

This means the context will be reset between each tuple even for nodes
like seqscan that don't use the per-tuple context for anything.
AllocSetReset exits quickly if there's nothing to do, but it takes a
couple of function calls to get there. I wouldn't normally worry about
that, but this is a very hot path for simple queries.

I tested this with:

CREATE TABLE foo AS SELECT generate_series(1,10000000);

I ran "SELECT COUNT(*) FROM foo" many times with \timing on, and took
the smallest time with and without the patch. I got:

1230 ms with the patch
1186 ms without the patch

This was quite repeatable, it's consistently faster without the patch.
That's a bigger difference than I expected. Any random code change can
swing results on micro-benchmarks like this by 1-2%, but that's over 3%.
Do we care?

I might be getting a bit carried away with this, but we could buy that
back by moving the isReset flag from AllocSetContext to
MemoryContextData. That way MemoryContextReset can exit more quickly if
there's nothing to do, patch attached.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

plug-foreign-scan-memory-leak-2.patchtext/x-diff; name=plug-foreign-scan-memory-leak-2.patchDownload

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 5089616..e900588 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -120,13 +120,17 @@ ExecScan(ScanState *node,
 	 */
 	qual = node->ps.qual;
 	projInfo = node->ps.ps_ProjInfo;
+	econtext = node->ps.ps_ExprContext;
 
 	/*
 	 * If we have neither a qual to check nor a projection to do, just skip
 	 * all the overhead and return the raw scan tuple.
 	 */
 	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
 		return ExecScanFetch(node, accessMtd, recheckMtd);
+	}
 
 	/*
 	 * Check to see if we're still projecting out tuples from a previous scan
@@ -148,7 +152,6 @@ ExecScan(ScanState *node,
 	 * storage allocated in the previous tuple cycle.  Note this can't happen
 	 * until we're done projecting out tuples from a scan tuple.
 	 */
-	econtext = node->ps.ps_ExprContext;
 	ResetExprContext(econtext);
 
 	/*

optimize-memorycontextreset-1.patchtext/x-diff; name=optimize-memorycontextreset-1.patchDownload

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index e95dcb6..42f76b7 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -128,7 +128,7 @@ typedef void *AllocPointer;
  * different from the aset being physically empty (empty blocks list) because
  * we may still have a keeper block.  It's also different from the set being
  * logically empty, because we don't attempt to detect pfree'ing the last
- * active chunk.
+ * active chunk. XXX isReset is not here anymore, move comment
  */
 typedef struct AllocSetContext
 {
@@ -136,7 +136,6 @@ typedef struct AllocSetContext
 	/* Info about storage allocated in this context: */
 	AllocBlock	blocks;			/* head of list of blocks in this set */
 	AllocChunk	freelist[ALLOCSET_NUM_FREELISTS];		/* free chunk lists */
-	bool		isReset;		/* T = no space alloced since last reset */
 	/* Allocation parameters for this context: */
 	Size		initBlockSize;	/* initial block size */
 	Size		maxBlockSize;	/* maximum block size */
@@ -418,8 +417,6 @@ AllocSetContextCreate(MemoryContext parent,
 		context->keeper = block;
 	}
 
-	context->isReset = true;
-
 	return (MemoryContext) context;
 }
 
@@ -463,10 +460,6 @@ AllocSetReset(MemoryContext context)
 
 	AssertArg(AllocSetIsValid(set));
 
-	/* Nothing to do if no pallocs since startup or last reset */
-	if (set->isReset)
-		return;
-
 #ifdef MEMORY_CONTEXT_CHECKING
 	/* Check for corruption and leaks before freeing */
 	AllocSetCheck(context);
@@ -510,8 +503,6 @@ AllocSetReset(MemoryContext context)
 
 	/* Reset block size allocation sequence, too */
 	set->nextBlockSize = set->initBlockSize;
-
-	set->isReset = true;
 }
 
 /*
@@ -620,8 +611,6 @@ AllocSetAlloc(MemoryContext context, Size size)
 			set->blocks = block;
 		}
 
-		set->isReset = false;
-
 		AllocAllocInfo(set, chunk);
 		return AllocChunkGetPointer(chunk);
 	}
@@ -653,9 +642,6 @@ AllocSetAlloc(MemoryContext context, Size size)
 		randomize_mem((char *) AllocChunkGetPointer(chunk), size);
 #endif
 
-		/* isReset must be false already */
-		Assert(!set->isReset);
-
 		AllocAllocInfo(set, chunk);
 		return AllocChunkGetPointer(chunk);
 	}
@@ -813,8 +799,6 @@ AllocSetAlloc(MemoryContext context, Size size)
 	randomize_mem((char *) AllocChunkGetPointer(chunk), size);
 #endif
 
-	set->isReset = false;
-
 	AllocAllocInfo(set, chunk);
 	return AllocChunkGetPointer(chunk);
 }
@@ -913,9 +897,6 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size)
 				 set->header.name, chunk);
 #endif
 
-	/* isReset must be false already */
-	Assert(!set->isReset);
-
 	/*
 	 * Chunk sizes are aligned to power of 2 in AllocSetAlloc(). Maybe the
 	 * allocated area already is >= the new size.  (In particular, we always
@@ -1050,15 +1031,13 @@ AllocSetGetChunkSpace(MemoryContext context, void *pointer)
 static bool
 AllocSetIsEmpty(MemoryContext context)
 {
-	AllocSet	set = (AllocSet) context;
-
 	/*
 	 * For now, we say "empty" only if the context is new or just reset. We
 	 * could examine the freelists to determine if all space has been freed,
 	 * but it's not really worth the trouble for present uses of this
 	 * functionality.
 	 */
-	if (set->isReset)
+	if (context->isReset)
 		return true;
 	return false;
 }
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 8783edf..980cce5 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -127,7 +127,12 @@ MemoryContextReset(MemoryContext context)
 	if (context->firstchild != NULL)
 		MemoryContextResetChildren(context);
 
-	(*context->methods->reset) (context);
+	/* Nothing to do if no pallocs since startup or last reset */
+	if (!context->isReset)
+	{
+		(*context->methods->reset) (context);
+		context->isReset = true;
+	}
 }
 
 /*
@@ -476,6 +481,7 @@ MemoryContextCreate(NodeTag tag, Size size,
 	node->parent = NULL;		/* for the moment */
 	node->firstchild = NULL;
 	node->nextchild = NULL;
+	node->isReset = true;
 	node->name = ((char *) node) + size;
 	strcpy(node->name, name);
 
@@ -504,13 +510,16 @@ MemoryContextCreate(NodeTag tag, Size size,
 void *
 MemoryContextAlloc(MemoryContext context, Size size)
 {
+	void	   *ret;
 	AssertArg(MemoryContextIsValid(context));
 
 	if (!AllocSizeIsValid(size))
 		elog(ERROR, "invalid memory alloc request size %lu",
 			 (unsigned long) size);
 
-	return (*context->methods->alloc) (context, size);
+	ret = (*context->methods->alloc) (context, size);
+	context->isReset = false;
+	return ret;
 }
 
 /*
@@ -535,6 +544,7 @@ MemoryContextAllocZero(MemoryContext context, Size size)
 
 	MemSetAligned(ret, 0, size);
 
+	context->isReset = false;
 	return ret;
 }
 
@@ -560,6 +570,7 @@ MemoryContextAllocZeroAligned(MemoryContext context, Size size)
 
 	MemSetLoop(ret, 0, size);
 
+	context->isReset = false;
 	return ret;
 }
 
@@ -620,6 +631,9 @@ repalloc(void *pointer, Size size)
 		elog(ERROR, "invalid memory alloc request size %lu",
 			 (unsigned long) size);
 
+	/* isReset must be false already */
+	Assert(!header->context->isReset);
+
 	return (*header->context->methods->realloc) (header->context,
 												 pointer, size);
 }

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 14 years ago

In reply to: Heikki Linnakangas (#5)

Re: Memory leak in FDW

On 27.04.2011 04:19, Heikki Linnakangas wrote:

On 26.04.2011 21:30, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

The trivial fix is to reset the per-tuple memory context between
iterations.

Have you tested this with SRFs?

ForeignNext seems like quite the wrong place for resetting
exprcontext in any case ...

ExecScan would be more appropriate I guess (attached).

This means the context will be reset between each tuple even for nodes
like seqscan that don't use the per-tuple context for anything.
AllocSetReset exits quickly if there's nothing to do, but it takes a
couple of function calls to get there. I wouldn't normally worry about
that, but this is a very hot path for simple queries.

I tested this with:

CREATE TABLE foo AS SELECT generate_series(1,10000000);

I ran "SELECT COUNT(*) FROM foo" many times with \timing on, and took
the smallest time with and without the patch. I got:

1230 ms with the patch
1186 ms without the patch

This was quite repeatable, it's consistently faster without the patch.
That's a bigger difference than I expected. Any random code change can
swing results on micro-benchmarks like this by 1-2%, but that's over 3%.
Do we care?

I might be getting a bit carried away with this, but we could buy that
back by moving the isReset flag from AllocSetContext to
MemoryContextData. That way MemoryContextReset can exit more quickly if
there's nothing to do, patch attached.

I hear no objections, so committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com